Skip to content

feat: add fixed-arity Syscall0Syscall15 variants to avoid heap allocation#445

Open
RomainMuller wants to merge 1 commit intoebitengine:mainfrom
RomainMuller:RomainMuller/fixed-arity-syscall
Open

feat: add fixed-arity Syscall0Syscall15 variants to avoid heap allocation#445
RomainMuller wants to merge 1 commit intoebitengine:mainfrom
RomainMuller:RomainMuller/fixed-arity-syscall

Conversation

@RomainMuller
Copy link
Copy Markdown

@RomainMuller RomainMuller commented Apr 30, 2026

Motivation

SyscallN is variadic (args ...uintptr). When called across a module
boundary the compiler must materialise the args as a []uintptr slice at
the call site. Because of //go:uintptrescapes the compiler cannot prove
the slice doesn't escape, so it is heap-allocated even for small, short-lived
call sites (anyhow, since the call is cross-module, the slice always escapes).

This allocation can represent a significant cost (time + GC churn) when calls are frequently made using the low-level API. Having fixed-arity variants helps alleviate this and can result in a 75% overhead reduction (almost entirely from the slice allocation).

What this PR does

Adds Syscall0 through Syscall15 — fixed-arity wrappers with explicit
named uintptr parameters. No slice is created at the call site; the
individual arguments are passed in registers, eliminating the allocation
entirely.

Internally each wrapper follows the same path as SyscallN:

  • a zero-initialised [maxArgs]uintptr array for integer args and one for
    float args are allocated on the wrapper's own stack frame,
  • the explicit arguments are assigned into those arrays,
  • the existing syscall_SyscallN internal is called (no new assembly),
  • the Windows path delegates to syscall_syscallN as before.

//go:uintptrescapes is retained on all wrappers for the same GC-pinning
reason it is present on SyscallN.

Testing

Two C helpers are added to testdata/abitest/abi_test.c:

  • stack_0_uintptr() — returns the constant 42 (smoke-tests Syscall0)
  • stack_15_uintptr(a1..a15) — returns the sum of its 15 arguments

A syscall_fixed sub-test inside TestABI_ArgumentPassing exercises every
arity (0–15), checks the absolute expected value, and verifies parity with
the equivalent SyscallN call.

…ocation

When SyscallN is called across module boundaries the variadic args slice
always escapes to the heap, even for small call sites. Expose Syscall0
through Syscall15 with explicit named parameters so callers avoid that
allocation entirely: no slice is formed at the call site, and the
stack-local [maxArgs]uintptr arrays are built entirely inside the wrapper.

The implementations follow the same pattern as SyscallN (zero-initialised
tmp/floats arrays, go:uintptrescapes, Windows delegation via
syscall_syscallN) and share the existing syscall_SyscallN internal path,
so no platform-specific assembly is required.

Tests are added to TestABI_ArgumentPassing using two new C helpers:
stack_0_uintptr (0-arg, returns 42) and stack_15_uintptr (15-arg sum),
covering every arity and verifying parity with SyscallN.

JJ-Change-Id: kpqxzv
@RomainMuller RomainMuller changed the title purego: add fixed-arity Syscall0–Syscall15 variants to avoid heap allocation feat: add fixed-arity Syscall0–Syscall15 variants to avoid heap allocation Apr 30, 2026
@RomainMuller RomainMuller changed the title feat: add fixed-arity Syscall0–Syscall15 variants to avoid heap allocation feat: add fixed-arity Syscall0Syscall15 variants to avoid heap allocation Apr 30, 2026
@RomainMuller RomainMuller marked this pull request as ready for review April 30, 2026 16:13
@qmuntal
Copy link
Copy Markdown
Contributor

qmuntal commented Apr 30, 2026

Why don't you remove //go:uintptrescapes from SyscallN instead, and add //go:nosplit + //go:uintptrkeepalive. This is how syscall.SyscallN is implemented on Windows, and it's alloc-free: https://github.com/golang/go/blob/17bd5ab8c650155dd2bd09f7005726552639eea0/src/syscall/dll_windows.go#L98.

@RomainMuller
Copy link
Copy Markdown
Author

RomainMuller commented Apr 30, 2026

it's alloc-free

Is it? I've not found a way to get the compiler to not always heap-escape the variadic slice for a cross-module call?

Specifically, my benchmark always shows 1 alloc that matches the vararg slice size... Only because it's cross-module.

Copy link
Copy Markdown
Contributor

@eliottness eliottness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On our side this is the last allocation we have in the hot path for our FFI calls. Would be awesome to get alloc parity with CGO there

@hajimehoshi
Copy link
Copy Markdown
Member

hajimehoshi commented May 1, 2026

Is it? I've not found a way to get the compiler to not always heap-escape the variadic slice for a cross-module call?

So didn't //go:nosplit + //go:uintptrkeepalive work?

@TotallyGamerJet
Copy link
Copy Markdown
Collaborator

Why don't you remove //go:uintptrescapes from SyscallN instead, and add //go:nosplit + //go:uintptrkeepalive. This is how syscall.SyscallN is implemented on Windows, and it's alloc-free: https://github.com/golang/go/blob/17bd5ab8c650155dd2bd09f7005726552639eea0/src/syscall/dll_windows.go#L98.

The syscalln that it calls does have the noescape pragma which I think may be the reason?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants