Skip to content

purego: avoid reflect.MakeFunc in RegisterFunc fast path#441

Closed
tmc wants to merge 7 commits intoebitengine:mainfrom
tmc:fast-func
Closed

purego: avoid reflect.MakeFunc in RegisterFunc fast path#441
tmc wants to merge 7 commits intoebitengine:mainfrom
tmc:fast-func

Conversation

@tmc
Copy link
Copy Markdown
Contributor

@tmc tmc commented Mar 18, 2026

What issue is this addressing?

#399

What type of issue is this addressing?

feature

What this PR does | solves

RegisterFunc currently uses reflect.MakeFunc to build closures, which allocates per-call for argument packing. This PR detects common signatures (up to 15 integer/pointer/bool args, with optional float32/float64 args) and installs concrete typed closures that call the assembly trampoline directly — zero allocations on the hot path.

Scope:

  • Pure integer/pointer/bool signatures (1-15 args): zero-alloc concrete closures
  • Trailing float32 or float64 (e.g. func(int64, int64, float32)): zero-alloc concrete closures
  • Interleaved floats (e.g. func(int64, float32, int64)): reflect.MakeFunc wrapper that still avoids per-arg boxing
  • Non-Windows, amd64/arm64 only (Windows uses syscall.Syscall15 which takes a different path)
  • SyscallN and RoundTrip paths are unchanged

Results

Apple M4 Max, arm64, Go 1.24, benchstat -count=8 -benchtime=5s:

RegisterFunc calling C functions (direct call, no callback)

                   │  main ns/op  │  fast ns/op  │   delta    │ allocs old → new │
1 arg                   145          26            -82%         3 → 0
2 args                  167          26            -84%         4 → 0
3 args                  184          27            -86%         5 → 0
5 args                  226          29            -87%         7 → 0
10 args                 828          30            -96%        22 → 0
14 args                1735          53            -97%        40 → 0
15 args                1904          72            -96%        43 → 0

RegisterFunc calling C functions (with Go callback round-trip)

                   │  main ns/op  │  fast ns/op  │   delta    │ allocs old → new │
1 arg                   347         165            -52%         5 → 2
2 args                  527         206            -61%         7 → 3
3 args                  835         232            -72%         8 → 3
5 args                  707         270            -62%        10 → 3
10 args                1290         400            -69%        25 → 3
14 args                2215         508            -77%        43 → 3
15 args                2358         592            -75%        46 → 3

SyscallN / RoundTrip

No statistically significant change (expected — these paths are not modified).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new “fast path” for RegisterFunc to avoid reflect.MakeFunc overhead for common ABI-compatible signatures, aiming to eliminate per-call allocations and significantly reduce call overhead on supported platforms.

Changes:

  • Adds signature detection in RegisterFunc to select a faster closure installation path for integer/pointer/bool args (and limited float cases).
  • Introduces a new fast_func.go implementation that installs concrete closures and uses a pooled syscall15Args + runtime_cgocall backend.
  • Expands the benchmark C fixture and adds tests validating correctness and (for key cases) zero allocations.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
func.go Adds fast-path eligibility detection and dispatch to the new fast closure installers before falling back to reflect.MakeFunc.
fast_func.go New implementation of fast-path closures and low-overhead call helpers.
func_test.go Adds tests covering fast-path correctness, allocation behavior, and float/interleaved-float scenarios.
testdata/benchmarktest/benchmark.c Adds C functions used by the new float/interleaved-float tests and benchmarks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread fast_func.go Outdated
Comment thread fast_func.go Outdated
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: 2026 The Ebitengine Authors

//go:build darwin || freebsd || linux || netbsd || windows
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will tighten the build tag to exclude Windows since the fast path is runtime-gated to non-Windows anyway.

Comment thread func.go Outdated
Comment thread func.go Outdated
Comment thread gen.go Outdated
package purego

//go:generate go run wincallback.go
//go:generate go run gen_fast_func.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer genfastfunc.go

Comment thread gen_fast_func.go Outdated
Comment thread gen_fast_func.go Outdated
Comment thread genfastfunc.go Outdated
Refactor emit functions in the code generator to use Go backtick
(raw) string literals instead of line-by-line WriteString/Fprintf
calls. This makes the generated code templates easier to read and
maintain.

The generated zz_fast_func.go output is byte-identical.
@hajimehoshi
Copy link
Copy Markdown
Member

hajimehoshi commented Mar 22, 2026

Hmm, would it be difficult to use text/template for code generations?

Replace fmt.Fprintf/WriteString code generation with text/template.
Templates now look like the Go code they produce, making the generator
much easier to review and maintain.

The generated zz_fast_func.go output is byte-identical.
@tmc
Copy link
Copy Markdown
Contributor Author

tmc commented Mar 23, 2026

Adopted text/template -- open to any suggestions to improve readability/maintainability

@hajimehoshi
Copy link
Copy Markdown
Member

The tests fail now

@tmc
Copy link
Copy Markdown
Contributor Author

tmc commented Mar 23, 2026

The tests fail now

fixed the gofmt issue

Comment thread gen.go
Comment thread genfastfunc.go
Comment thread genfastfunc.go
Comment thread genfastfunc.go Outdated
Comment thread func.go
Comment thread func.go
tmc added 3 commits March 25, 2026 07:43
Use var declarations for zero-value variables in tryRegisterFastPath
per reviewer preference (var numFloats int instead of numFloats := 0).

Add gofmt -s -w as a go:generate step for safety.
Copy link
Copy Markdown
Member

@hajimehoshi hajimehoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@TotallyGamerJet PTAL

Comment thread func.go
return *(*uintptr)(unsafe.Pointer(&buf[0]))
}

func flattenStructFields(v reflect.Value, buf []byte) {
Copy link
Copy Markdown
Collaborator

@TotallyGamerJet TotallyGamerJet Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this just copy(buf, unsafe.Slice((*byte)(v.UnsafePointer()), v.Type().Size()))?

Comment thread func.go
func flattenStructToUintptr(v reflect.Value) uintptr {
var buf [8]byte
flattenStructFields(v, buf[:])
return *(*uintptr)(unsafe.Pointer(&buf[0]))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely breaks unsafe rules. We can't return a pointer to a slice that may get GCd

@TotallyGamerJet
Copy link
Copy Markdown
Collaborator

I'm not sure I feel comfortable with this change. It's a lot of extremely dense code to parse and review. Even the generator code is difficult. If I can't review it I can't be confident there isn't parts of it that break unsafe rules. I believe I've already found one in my cursory review. I lean towards rejection.

@hajimehoshi
Copy link
Copy Markdown
Member

Hmm, as this can be a maintainer burden, we should not go with this route. @tmc What do you think? Is this code complexity inevitable?

@tmc
Copy link
Copy Markdown
Contributor Author

tmc commented Apr 10, 2026

I'm.looking into that unsafe violation now.

I'm using (attempting to use) purego for some ML operations and need zero allocation paths to make performance viable.

Open to any ideas for how to make this achievable, this has been my best cut at it so far.

@tmc
Copy link
Copy Markdown
Contributor Author

tmc commented Apr 10, 2026

I'm not sure I feel comfortable with this change. It's a lot of extremely dense code to parse and review. Even the generator code is difficult. If I can't review it I can't be confident there isn't parts of it that break unsafe rules. I believe I've already found one in my cursory review. I lean towards rejection.

That’s fair. I found and am fixing a real liveness gap in one of the Reflect fallback paths while reviewing this, so I agree the PR is carrying too much unsafe/ABI surface to review comfortably as-is.

I’m going to reduce the scope and make the safety argument more explicit. In particular, I’ll keep the mechanically generated pieces aligned with focused tests, avoid unsafe byte-buffer casts in the struct path, and either remove pointer signatures from the forceFuncPtr fast path or document a tighter argument before asking you to review that part again.

@tmc
Copy link
Copy Markdown
Contributor Author

tmc commented Apr 20, 2026

I can work around this with a go:linkname kludge and may open a PR to expose a lower level api that is zero alloc (or just vendor or re-implement). Closing.

@tmc tmc closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants