purego: avoid reflect.MakeFunc in RegisterFunc fast path#441
purego: avoid reflect.MakeFunc in RegisterFunc fast path#441tmc wants to merge 7 commits intoebitengine:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new “fast path” for RegisterFunc to avoid reflect.MakeFunc overhead for common ABI-compatible signatures, aiming to eliminate per-call allocations and significantly reduce call overhead on supported platforms.
Changes:
- Adds signature detection in
RegisterFuncto select a faster closure installation path for integer/pointer/bool args (and limited float cases). - Introduces a new
fast_func.goimplementation that installs concrete closures and uses a pooledsyscall15Args+runtime_cgocallbackend. - Expands the benchmark C fixture and adds tests validating correctness and (for key cases) zero allocations.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
func.go |
Adds fast-path eligibility detection and dispatch to the new fast closure installers before falling back to reflect.MakeFunc. |
fast_func.go |
New implementation of fast-path closures and low-overhead call helpers. |
func_test.go |
Adds tests covering fast-path correctness, allocation behavior, and float/interleaved-float scenarios. |
testdata/benchmarktest/benchmark.c |
Adds C functions used by the new float/interleaved-float tests and benchmarks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // SPDX-License-Identifier: Apache-2.0 | ||
| // SPDX-FileCopyrightText: 2026 The Ebitengine Authors | ||
|
|
||
| //go:build darwin || freebsd || linux || netbsd || windows |
There was a problem hiding this comment.
Good point. Will tighten the build tag to exclude Windows since the fast path is runtime-gated to non-Windows anyway.
| package purego | ||
|
|
||
| //go:generate go run wincallback.go | ||
| //go:generate go run gen_fast_func.go |
Refactor emit functions in the code generator to use Go backtick (raw) string literals instead of line-by-line WriteString/Fprintf calls. This makes the generated code templates easier to read and maintain. The generated zz_fast_func.go output is byte-identical.
|
Hmm, would it be difficult to use text/template for code generations? |
Replace fmt.Fprintf/WriteString code generation with text/template. Templates now look like the Go code they produce, making the generator much easier to review and maintain. The generated zz_fast_func.go output is byte-identical.
|
Adopted text/template -- open to any suggestions to improve readability/maintainability |
|
The tests fail now |
fixed the gofmt issue |
Use var declarations for zero-value variables in tryRegisterFastPath per reviewer preference (var numFloats int instead of numFloats := 0). Add gofmt -s -w as a go:generate step for safety.
hajimehoshi
left a comment
There was a problem hiding this comment.
LGTM, thanks!
@TotallyGamerJet PTAL
| return *(*uintptr)(unsafe.Pointer(&buf[0])) | ||
| } | ||
|
|
||
| func flattenStructFields(v reflect.Value, buf []byte) { |
There was a problem hiding this comment.
Why isn't this just copy(buf, unsafe.Slice((*byte)(v.UnsafePointer()), v.Type().Size()))?
| func flattenStructToUintptr(v reflect.Value) uintptr { | ||
| var buf [8]byte | ||
| flattenStructFields(v, buf[:]) | ||
| return *(*uintptr)(unsafe.Pointer(&buf[0])) |
There was a problem hiding this comment.
This definitely breaks unsafe rules. We can't return a pointer to a slice that may get GCd
|
I'm not sure I feel comfortable with this change. It's a lot of extremely dense code to parse and review. Even the generator code is difficult. If I can't review it I can't be confident there isn't parts of it that break unsafe rules. I believe I've already found one in my cursory review. I lean towards rejection. |
|
Hmm, as this can be a maintainer burden, we should not go with this route. @tmc What do you think? Is this code complexity inevitable? |
|
I'm.looking into that unsafe violation now. I'm using (attempting to use) purego for some ML operations and need zero allocation paths to make performance viable. Open to any ideas for how to make this achievable, this has been my best cut at it so far. |
That’s fair. I found and am fixing a real liveness gap in one of the Reflect fallback paths while reviewing this, so I agree the PR is carrying too much unsafe/ABI surface to review comfortably as-is. I’m going to reduce the scope and make the safety argument more explicit. In particular, I’ll keep the mechanically generated pieces aligned with focused tests, avoid unsafe byte-buffer casts in the struct path, and either remove pointer signatures from the forceFuncPtr fast path or document a tighter argument before asking you to review that part again. |
|
I can work around this with a go:linkname kludge and may open a PR to expose a lower level api that is zero alloc (or just vendor or re-implement). Closing. |
What issue is this addressing?
#399
What type of issue is this addressing?
feature
What this PR does | solves
RegisterFunccurrently usesreflect.MakeFuncto build closures, which allocates per-call for argument packing. This PR detects common signatures (up to 15 integer/pointer/bool args, with optional float32/float64 args) and installs concrete typed closures that call the assembly trampoline directly — zero allocations on the hot path.Scope:
func(int64, int64, float32)): zero-alloc concrete closuresfunc(int64, float32, int64)):reflect.MakeFuncwrapper that still avoids per-arg boxingsyscall.Syscall15which takes a different path)Results
Apple M4 Max, arm64, Go 1.24, benchstat -count=8 -benchtime=5s:
RegisterFunc calling C functions (direct call, no callback)
RegisterFunc calling C functions (with Go callback round-trip)
SyscallN / RoundTrip
No statistically significant change (expected — these paths are not modified).