ci: integrate CodSpeed continuous benchmarking#9975
Draft
adriangb wants to merge 1 commit into
Draft
Conversation
Wire the existing criterion benches into CodSpeed (https://codspeed.io) for continuous performance tracking. CodSpeed runs benches under CPU simulation in CI and posts per-PR comparison reports vs. the base branch's latest main run. Highlights ========== - `criterion` workspace dependency renamed to `codspeed-criterion-compat`: a drop-in passthrough that wraps real criterion when running outside cargo-codspeed, so bench source code needs no changes (`use criterion::*` keeps working) and `cargo bench` locally is unaffected. - Two workflows: - `.github/workflows/codspeed.yml` runs on every push to main and populates the base-branch baseline. - `.github/workflows/codspeed-pr.yml` runs on PRs when a `bench:*` label is attached, so external contributors don't blindly burn CI capacity. Labels are namespaced per crate: bench:all # whole workspace bench:arrow # all of arrow's benches bench:parquet bench:arrow-cast # union - Sharded one job per `[[bench]]` target (~78 shards after exclusions). Required because (a) the full workspace produces >1000 individual benchmarks per upload, and (b) the parquet crate alone produces >1000 due to heavy criterion parameterization, both of which exceed CodSpeed's per-upload limit. Jobs in the same workflow are auto-aggregated by CodSpeed into a single report. Ref: https://codspeed.io/docs/features/sharded-benchmarks - Build-once / run-many topology: setup ─┐ ├──→ bench (matrix, N shards) build ─┘ `build` does the full-workspace `cargo codspeed build` exactly once and uploads `target/codspeed/` as a tar artifact (tar preserves the +x bit, which `actions/upload-artifact` strips otherwise). Each bench shard downloads the artifact and invokes `cargo codspeed run -p <crate> --bench <bench>`. No rebuild per shard, so CI cost scales with N shards × ~2 min instead of ×10 min. - Dynamic matrix: `setup` parses every workspace member's Cargo.toml for `[[bench]]` entries with awk + jq and emits a JSON `{crate, bench}` array, so new bench targets are picked up automatically without touching the workflow. - Auth: GitHub OIDC. No `CODSPEED_TOKEN` secret needed for the public repo; the workflow's `id-token` claim is what CodSpeed verifies. Exclusions ========== Ten bench targets currently fail at runtime (e.g. `merge_kernels` panics in `arrow-data/src/transform/primitive.rs:31`); these are pre-existing issues in the benches themselves, not the integration. They're listed in an `EXCLUDED_BENCHES` env in both workflows so the remaining ~78 shards run clean. Each excluded target should be fixed and removed from the list one by one. Prerequisites for activation ============================ - Install the CodSpeed GitHub App on `apache/arrow-rs`: https://github.com/apps/codspeed - Enroll the repository at https://codspeed.io (the OIDC integration is automatic for public repos; no secret token configuration required) Once both are done, the first push to main will populate the baseline and PRs labeled `bench:*` will receive automated CodSpeed comparison comments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires the existing criterion benches in this workspace into CodSpeed for continuous performance tracking. CodSpeed runs benches under CPU simulation in CI and posts per-PR comparison reports vs. the base branch's latest main run.
This PR is opt-in once activated: the PR workflow only fires when a maintainer adds a
bench:*label, so external contributors don't blindly burn CI capacity. The main-push workflow keeps the baseline current.The integration has been validated end-to-end on a fork (
pydantic/arrow-rs): 3031 benchmarks captured from a single main run, PR runs produce clean comparison comments (e.g. "Merging this PR will not alter performance — ✅ 7 untouched benchmarks, ⏩ 3024 skipped benchmarks, comparing codspeed-smoke-test (5b1320a) with main (fcbe248)"). Public dashboard: https://codspeed.io/pydantic/arrow-rsDesign
Drop-in shim, no bench source changes
The
criterionworkspace dependency is renamed (via the[package]cargo trick) tocodspeed-criterion-compat. This is a CodSpeed-maintained passthrough — when not running undercargo codspeed, it forwards to real criterion, socargo benchlocally is unchanged and every existinguse criterion::*in every bench source file compiles unmodified.Sharded one job per
[[bench]]targetRequired for two reasons:
parquetcrate alone exceeds 1000 — per-crate sharding wasn't fine enough.Jobs within a single workflow are auto-aggregated by CodSpeed into one unified report.
Build once, run many
setupparses every workspace member'sCargo.tomlfor[[bench]]entries (awk + jq), emits a JSON{crate, bench}array; new bench targets are picked up automatically.buildruns the full-workspacecargo codspeed buildexactly once, packstarget/codspeed/into a tarball (tar preserves the +x bit;actions/upload-artifactstrips it otherwise), uploads as a 1-day artifact.cargo codspeed run -p <crate> --bench <bench>. No per-shard rebuild — CI cost scales with N × ~2 min instead of N × full build.Label-gated PRs
codspeed-pr.ymlfires onpull_request: [labeled, synchronize, opened, reopened]and only runs when the PR has at least onebench:*label:bench:all[[bench]]in the workspacebench:<crate>[[bench]]in that cratebench:<crate-a> bench:<crate-b>Label suffixes are validated against
^[a-z][a-z0-9_-]*$. Authorization is implicit: only users with write access can add labels.While the label is attached, every push to the PR re-runs the suite (
synchronizeevent); re-runs cancel in-progress shards viaconcurrency: cancel-in-progress: true.OIDC auth
Public repo, no
CODSPEED_TOKENsecret required — the workflow'sid-token: writeclaim is what CodSpeed verifies. Workflows are repo-agnostic.Exclusions
Ten bench targets currently fail at runtime in this workspace — pre-existing issues in the bench targets themselves, not the integration. They're listed in an
EXCLUDED_BENCHESenv in both workflows so the remaining ~78 shards run clean. Each excluded target should be fixed (or removed) and dropped from the list one by one:arrow / merge_kernelsarrow-data/src/transform/primitive.rs:31:43arrow / buffer_bit_opsarrow / buffer_createarrow / sort_kernelarrow / string_run_builderarrow / primitive_run_accessorarrow-array / union_arrayarrow-cast / parse_dateparquet / row_selection_cursorparquet-variant-compute / variant_kernelsI'm happy to file separate upstream issues for each if helpful — or to drop the exclusion list entirely if maintainers prefer to investigate them all at once. The same
merge_kernelsexclusion was added by the official CodSpeed wizard's auto-generated PR (https://codspeed.io/docs/get-started/wizard), so this is consistent prior art.Prerequisites for activation
This PR adds the workflow files but they're inert until two repo-admin actions land:
apache/arrow-rs. This is what posts the PR comparison comment + status check.Once both are done, the first push to
mainwill populate the baseline and PRs labeledbench:*will receive automated comparison comments.CI cost notes
build, but only the bench shards for the labeled crates. A typicalbench:arrow-castrun is build + 3 shards.Test plan
cargo check --workspace --benches --features arrow/test_utils,arrow-schema/ffi,parquet/test_common,parquet/experimental,parquet/async,parquet/object_storepasses against this branchpydantic/arrow-rs: main baseline run captured 3031 benchmarks; PR run posts comparison comment correctly; per-shard sharding stays under the 1000-benchmark limitapache/arrow-rs, first main run populates baseline at https://codspeed.io/apache/arrow-rsbench:alland per-cratebench:<crate>labels in repo settingsbench:<crate>to a real PR; confirm comparison comment + status check appearReferences
🤖 Generated with Claude Code