Cut ~57% of load-time invalidations#1344
Conversation
|
Two narrow changes that drop invalidated MIs by ~57% on Julia 1.12 and
~37% on Julia 1.14 nightly, with no behaviour change.
- `any`/`all`/`count` with a function argument: drop the specialised
methods. Base routes `any(f, A)` through `mapreduce(f, |, A; ...)`,
and we already specialise `mapreduce` for `StaticArray`, so the fast
path is preserved. The removed `::Bool` cast was defensive and not
needed for statically-known eltypes; the explicit `init=false`
matches the default `_InitialValue` behaviour for `|`/`&`/`+`.
- `setindex!(::TrivialView, inds...)`: add the missing `v` slot so the
signature matches Base.
The third change from the original draft (narrowing `eachindex` to
`N ≥ 2`) is being extracted into a follow-up PR — the implementation
needed a `Union{(StaticArray{<:Tuple,T,N} where T for N in 2:32)...}`
which is hard to love.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
aa84098 to
a289ae5
Compare
|
Thanks for the review! Dropped the Remeasured on Julia 1.14.0-DEV (nightly), 10 samples per row:
Load-time wall-clock is essentially unchanged on nightly (within noise), but invalidations drop ~37%. On 1.12 it's −10% load + −57% invalidations because the |
|
thanks, that's still a pretty good improvement. can you ask claude to search for any performance regressions when
is correct, but eltypes are not always statically known not that runtime performance regressions would necessarily be a merge blocker --- if you are hitting dynamic dispatch the code will already be slow anyway --- but I think we oughta know the tradeoffs. it might also be useful if claude could come up with a self-contained reproducer for this claim
that creates a fresh temp environment, wipes the precompile cache, and measures the compile time improvement for some big package downstream of apologies for the abundance of caution: I know |
|
Good call on regression-checking. Ran a perf bench (
All times in ns, Chairmarks Summary
Suggested narrower variant: drop only |
|
Downstream-compile-time reproducer + numbers. Reproducer (self-contained, ~50 lines): fresh temp env per variant, # downstream.jl — usage: julia downstream.jl <SA_path> <PkgName> [trials]
using Pkg, Printf
const SA_PATH = abspath(ARGS[1])
const PKG = String(Symbol(ARGS[2]))
const TRIALS = length(ARGS) >= 3 ? parse(Int, ARGS[3]) : 3
const CACHE = joinpath(first(DEPOT_PATH), "compiled", "v$(VERSION.major).$(VERSION.minor)")
wipe!() = (d = joinpath(CACHE, PKG); isdir(d) && rm(d; recursive=true))
cd(SA_PATH)
function run_variant(label, branch)
println("\n==== $label (branch=$branch) ====")
run(`git checkout $branch -- src/`)
proj = mktempdir(prefix="sa-down-$label-")
Pkg.activate(proj); Pkg.develop(path=SA_PATH; io=devnull)
Pkg.add(PKG; io=devnull); Pkg.precompile(io=devnull)
pre, usg = Float64[], Float64[]
for trial in 1:TRIALS
wipe!(); t1 = @elapsed Pkg.precompile(io=devnull); push!(pre, t1)
wipe!()
script = "using Pkg; Pkg.precompile(io=devnull); print(@elapsed (@eval using $PKG))"
t2 = parse(Float64, readchomp(`julia --project=$proj -e $script`))
push!(usg, t2)
@printf " trial %d: precompile=%.2fs cold-using=%.3fs\n" trial t1 t2
end
end
run_variant("master", "master"); run_variant("PR", "reduce-invalidations")
run(`git checkout reduce-invalidations -- src/`)Results (Julia 1.12.6, median of 3 trials)
Also ran a "realistic" variant (5 trials) where a fresh process does
Honest read: the 640-MI reduction is real but most of those MIs are compiler-internal ( |
Three small dispatch tweaks that drop invalidated
MethodInstances from ~2030 to ~875 on Julia 1.12, with no behaviour change and all tests passing. Related to #1074.What changed
1.
eachindex(::IndexLinear, ::StaticArray)→ restrict to rank N ≥ 2.Base's
eachindex(::IndexLinear, ::AbstractVector) = axes1(A)already returnsSOneTofor static vectors, so we only need our specialised path for higher ranks. Pinning rank with concreteN(via aUnionover2:32) is what makes Julia's invalidator seeUnion{}againstAbstractVector{X}— awhere Nclause still intersects becauseNcould be 1. Wipes the entireeachindexinvalidation tree (~513 MIs).2. Drop
any(f::Function, ::StaticArray),all(f::Function, ::StaticArray),count(f, ::StaticArray).Base's
any(f, A)→_any(f, A, dims)→mapreduce(f, |, A; ...). Since we already specialisemapreduceforStaticArray, the fast path is preserved without the extra method. The dropped::Boolcast was defensive and isn't needed when eltypes are statically known;init=falsematches the default_InitialValuebehaviour for|/&/+. Removes the biggest single invalidation source (~640 MIs from compiler-internalany(::Function, ::AbstractArray)callers).3.
setindex!(::TrivialView, inds...)→setindex!(::TrivialView, v, inds...).The old signature was missing the value slot, so it superseded
Base.setindex!(::AbstractArray, v, I...)more aggressively than needed. -4 MIs.Numbers (Julia 1.12.6, this branch)
@time_importsLoad time is dominated by C-level method-table registration so the wall-clock win is small, but downstream packages that hit
AbstractVectororany(::Function, ::AbstractArray)will see less recompilation triggered byusing StaticArrays.Tested
Full
Pkg.test()suite passes locally.