feat: add fixed-arity Syscall0–Syscall15 variants to avoid heap allocation#445
feat: add fixed-arity Syscall0–Syscall15 variants to avoid heap allocation#445RomainMuller wants to merge 1 commit intoebitengine:mainfrom
Syscall0–Syscall15 variants to avoid heap allocation#445Conversation
…ocation When SyscallN is called across module boundaries the variadic args slice always escapes to the heap, even for small call sites. Expose Syscall0 through Syscall15 with explicit named parameters so callers avoid that allocation entirely: no slice is formed at the call site, and the stack-local [maxArgs]uintptr arrays are built entirely inside the wrapper. The implementations follow the same pattern as SyscallN (zero-initialised tmp/floats arrays, go:uintptrescapes, Windows delegation via syscall_syscallN) and share the existing syscall_SyscallN internal path, so no platform-specific assembly is required. Tests are added to TestABI_ArgumentPassing using two new C helpers: stack_0_uintptr (0-arg, returns 42) and stack_15_uintptr (15-arg sum), covering every arity and verifying parity with SyscallN. JJ-Change-Id: kpqxzv
Syscall0–Syscall15 variants to avoid heap allocation
|
Why don't you remove |
Is it? I've not found a way to get the compiler to not always heap-escape the variadic slice for a cross-module call? Specifically, my benchmark always shows 1 alloc that matches the vararg slice size... Only because it's cross-module. |
So didn't //go:nosplit + //go:uintptrkeepalive work? |
The syscalln that it calls does have the noescape pragma which I think may be the reason? |
Motivation
SyscallNis variadic (args ...uintptr). When called across a moduleboundary the compiler must materialise the args as a
[]uintptrslice atthe call site. Because of
//go:uintptrescapesthe compiler cannot provethe slice doesn't escape, so it is heap-allocated even for small, short-lived
call sites (anyhow, since the call is cross-module, the slice always escapes).
This allocation can represent a significant cost (time + GC churn) when calls are frequently made using the low-level API. Having fixed-arity variants helps alleviate this and can result in a 75% overhead reduction (almost entirely from the slice allocation).
What this PR does
Adds
Syscall0throughSyscall15— fixed-arity wrappers with explicitnamed
uintptrparameters. No slice is created at the call site; theindividual arguments are passed in registers, eliminating the allocation
entirely.
Internally each wrapper follows the same path as
SyscallN:[maxArgs]uintptrarray for integer args and one forfloat args are allocated on the wrapper's own stack frame,
syscall_SyscallNinternal is called (no new assembly),syscall_syscallNas before.//go:uintptrescapesis retained on all wrappers for the same GC-pinningreason it is present on
SyscallN.Testing
Two C helpers are added to
testdata/abitest/abi_test.c:stack_0_uintptr()— returns the constant42(smoke-testsSyscall0)stack_15_uintptr(a1..a15)— returns the sum of its 15 argumentsA
syscall_fixedsub-test insideTestABI_ArgumentPassingexercises everyarity (0–15), checks the absolute expected value, and verifies parity with
the equivalent
SyscallNcall.