ci: integrate CodSpeed continuous benchmarking by adriangb · Pull Request #9975 · apache/arrow-rs

adriangb · 2026-05-14T18:40:10Z

Summary

Wires the existing criterion benches in this workspace into CodSpeed for continuous performance tracking. CodSpeed runs benches under CPU simulation in CI and posts per-PR comparison reports vs. the base branch's latest main run.

This PR is opt-in once activated: the PR workflow only fires when a maintainer adds a bench:* label, so external contributors don't blindly burn CI capacity. The main-push workflow keeps the baseline current.

The integration has been validated end-to-end on a fork (pydantic/arrow-rs): 3031 benchmarks captured from a single main run, PR runs produce clean comparison comments (e.g. "Merging this PR will not alter performance — ✅ 7 untouched benchmarks, ⏩ 3024 skipped benchmarks, comparing codspeed-smoke-test (5b1320a) with main (fcbe248)"). Public dashboard: https://codspeed.io/pydantic/arrow-rs

Design

Drop-in shim, no bench source changes

The criterion workspace dependency is renamed (via the [package] cargo trick) to codspeed-criterion-compat. This is a CodSpeed-maintained passthrough — when not running under cargo codspeed, it forwards to real criterion, so cargo bench locally is unchanged and every existing use criterion::* in every bench source file compiles unmodified.

# Cargo.toml (workspace)
criterion = { package = "codspeed-criterion-compat", version = "4.6", default-features = false }

Sharded one job per `[[bench]]` target

Required for two reasons:

The full workspace produces well over 1000 individual benchmarks (criterion parameterizes heavily), which exceeds CodSpeed's per-upload limit.
Even the parquet crate alone exceeds 1000 — per-crate sharding wasn't fine enough.

Jobs within a single workflow are auto-aggregated by CodSpeed into one unified report.

Build once, run many

setup ─┐
       ├──→ bench (matrix, ~78 shards)
build ─┘

setup parses every workspace member's Cargo.toml for [[bench]] entries (awk + jq), emits a JSON {crate, bench} array; new bench targets are picked up automatically.
build runs the full-workspace cargo codspeed build exactly once, packs target/codspeed/ into a tarball (tar preserves the +x bit; actions/upload-artifact strips it otherwise), uploads as a 1-day artifact.
Each bench shard downloads the artifact, unpacks it, runs cargo codspeed run -p <crate> --bench <bench>. No per-shard rebuild — CI cost scales with N × ~2 min instead of N × full build.

Label-gated PRs

codspeed-pr.yml fires on pull_request: [labeled, synchronize, opened, reopened] and only runs when the PR has at least one bench:* label:

Label	Effect
`bench:all`	Every `[[bench]]` in the workspace
`bench:<crate>`	Every `[[bench]]` in that crate
`bench:<crate-a> bench:<crate-b>`	Union

Label suffixes are validated against ^[a-z][a-z0-9_-]*$. Authorization is implicit: only users with write access can add labels.

While the label is attached, every push to the PR re-runs the suite (synchronize event); re-runs cancel in-progress shards via concurrency: cancel-in-progress: true.

OIDC auth

Public repo, no CODSPEED_TOKEN secret required — the workflow's id-token: write claim is what CodSpeed verifies. Workflows are repo-agnostic.

Exclusions

Ten bench targets currently fail at runtime in this workspace — pre-existing issues in the bench targets themselves, not the integration. They're listed in an EXCLUDED_BENCHES env in both workflows so the remaining ~78 shards run clean. Each excluded target should be fixed (or removed) and dropped from the list one by one:

Target	Observed failure mode
`arrow / merge_kernels`	panics at `arrow-data/src/transform/primitive.rs:31:43`
`arrow / buffer_bit_ops`	runtime error
`arrow / buffer_create`	runtime error
`arrow / sort_kernel`	runtime error
`arrow / string_run_builder`	runtime error
`arrow / primitive_run_accessor`	runtime error
`arrow-array / union_array`	runtime error
`arrow-cast / parse_date`	runtime error
`parquet / row_selection_cursor`	runtime error
`parquet-variant-compute / variant_kernels`	intermittent

I'm happy to file separate upstream issues for each if helpful — or to drop the exclusion list entirely if maintainers prefer to investigate them all at once. The same merge_kernels exclusion was added by the official CodSpeed wizard's auto-generated PR (https://codspeed.io/docs/get-started/wizard), so this is consistent prior art.

Prerequisites for activation

This PR adds the workflow files but they're inert until two repo-admin actions land:

Install the CodSpeed GitHub App on apache/arrow-rs. This is what posts the PR comparison comment + status check.
Enroll the repository at https://codspeed.io. OIDC is automatic for public repos — no secret token configuration required.

Once both are done, the first push to main will populate the baseline and PRs labeled bench:* will receive automated comparison comments.

CI cost notes

Main-push workflow: 1 build + 78 shards. Build job dominates wallclock (~10 min); shards run in parallel and download from one artifact, ~2 min each.
PR workflow: same build, but only the bench shards for the labeled crates. A typical bench:arrow-cast run is build + 3 shards.
Per-target bench binaries are bundled in one ~1-2 GB artifact (well under GitHub's 5 GB free-tier limit).

Test plan

cargo check --workspace --benches --features arrow/test_utils,arrow-schema/ffi,parquet/test_common,parquet/experimental,parquet/async,parquet/object_store passes against this branch
End-to-end validation on pydantic/arrow-rs: main baseline run captured 3031 benchmarks; PR run posts comparison comment correctly; per-shard sharding stays under the 1000-benchmark limit
After merge and CodSpeed-App install on apache/arrow-rs, first main run populates baseline at https://codspeed.io/apache/arrow-rs
Create the bench:all and per-crate bench:<crate> labels in repo settings
Add bench:<crate> to a real PR; confirm comparison comment + status check appear

References

CodSpeed docs: https://codspeed.io/docs
Sharded benchmarks: https://codspeed.io/docs/features/sharded-benchmarks
Compat shim source: https://github.com/CodSpeedHQ/codspeed-rust
Prior auto-generated wizard PR on pydantic fork: Add CodSpeed continuous performance benchmarking pydantic/arrow-rs#11 (single-shard; hit the >1000 limit, which this PR resolves)

🤖 Generated with Claude Code

Wire the existing criterion benches into CodSpeed (https://codspeed.io) for continuous performance tracking. CodSpeed runs benches under CPU simulation in CI and posts per-PR comparison reports vs. the base branch's latest main run. Highlights ========== - `criterion` workspace dependency renamed to `codspeed-criterion-compat`: a drop-in passthrough that wraps real criterion when running outside cargo-codspeed, so bench source code needs no changes (`use criterion::*` keeps working) and `cargo bench` locally is unaffected. - Two workflows: - `.github/workflows/codspeed.yml` runs on every push to main and populates the base-branch baseline. - `.github/workflows/codspeed-pr.yml` runs on PRs when a `bench:*` label is attached, so external contributors don't blindly burn CI capacity. Labels are namespaced per crate: bench:all # whole workspace bench:arrow # all of arrow's benches bench:parquet bench:arrow-cast # union - Sharded one job per `[[bench]]` target (~78 shards after exclusions). Required because (a) the full workspace produces >1000 individual benchmarks per upload, and (b) the parquet crate alone produces >1000 due to heavy criterion parameterization, both of which exceed CodSpeed's per-upload limit. Jobs in the same workflow are auto-aggregated by CodSpeed into a single report. Ref: https://codspeed.io/docs/features/sharded-benchmarks - Build-once / run-many topology: setup ─┐ ├──→ bench (matrix, N shards) build ─┘ `build` does the full-workspace `cargo codspeed build` exactly once and uploads `target/codspeed/` as a tar artifact (tar preserves the +x bit, which `actions/upload-artifact` strips otherwise). Each bench shard downloads the artifact and invokes `cargo codspeed run -p <crate> --bench <bench>`. No rebuild per shard, so CI cost scales with N shards × ~2 min instead of ×10 min. - Dynamic matrix: `setup` parses every workspace member's Cargo.toml for `[[bench]]` entries with awk + jq and emits a JSON `{crate, bench}` array, so new bench targets are picked up automatically without touching the workflow. - Auth: GitHub OIDC. No `CODSPEED_TOKEN` secret needed for the public repo; the workflow's `id-token` claim is what CodSpeed verifies. Exclusions ========== Ten bench targets currently fail at runtime (e.g. `merge_kernels` panics in `arrow-data/src/transform/primitive.rs:31`); these are pre-existing issues in the benches themselves, not the integration. They're listed in an `EXCLUDED_BENCHES` env in both workflows so the remaining ~78 shards run clean. Each excluded target should be fixed and removed from the list one by one. Prerequisites for activation ============================ - Install the CodSpeed GitHub App on `apache/arrow-rs`: https://github.com/apps/codspeed - Enroll the repository at https://codspeed.io (the OIDC integration is automatic for public repos; no secret token configuration required) Once both are done, the first push to main will populate the baseline and PRs labeled `bench:*` will receive automated CodSpeed comparison comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: integrate CodSpeed continuous benchmarking#9975

ci: integrate CodSpeed continuous benchmarking#9975
adriangb wants to merge 1 commit into
apache:mainfrom
pydantic:codspeed-integration

adriangb commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adriangb commented May 14, 2026

Summary

Design

Drop-in shim, no bench source changes

Sharded one job per [[bench]] target

Build once, run many

Label-gated PRs

OIDC auth

Exclusions

Prerequisites for activation

CI cost notes

Test plan

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sharded one job per `[[bench]]` target