Workspace-Wide Benchmarking Foundation by aaj3f · Pull Request #1228 · fluree/db

aaj3f · 2026-05-08T18:43:28Z

Summary

5 commits on feature/stable-bench standing up the workspace's benchmarking infrastructure end-to-end. This is partially for general bench auditability, but also provides a baseline against which more aggressive refactors or improvements in the future can be run with clear-eyed assessments of whether it affects hot-path performance. Hot-paths such as commit, index build, incremental index, bulk import, novelty replay, cold reload, hot SPARQL query now have a bench, every bench is registered with a regression budget, and ubuntu-latest runs a smoke gate on every PR. Adds ~4,500 LoC across a new fluree-bench-support crate, 7 new bench files, 5 retrofitted ones, 3 contributor docs, and a CI workflow extension.

The companion regression-comparison phase (committed baselines + nightly cron + dashboard) lands separately as bench-nightly.

Reading order

Reader profile	Read
Skim-only reviewer (~15 min)	This description + `BENCHMARKING.md`
Per-feature reviewer (~30 min)	This + the relevant commit's body and the bench file it lands
Implementor / chassis maintainer (~1 hr)	This + `docs/contributing/benches.md` + `fluree-bench-support/README.md` + the chassis source
Plan / RFC author	The bench-infrastructure plan in `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` is the design rationale this PR implements

What landed (5 commits)

`b3ac71724` — bench chassis + retrofit 5 existing benches

The new internal fluree-bench-support crate:

init_tracing_for_bench() — opt-in FLUREE_BENCH_TRACING=1 subscriber. Replaces an 18-line block previously duplicated verbatim across three benches.
next_ledger_alias(prefix) — atomic, never-reused alias of the form bench/{prefix}-{n}:main. Replaces three independent LEDGER_COUNTER: AtomicU64 patterns.
bench_runtime(), BenchProfile, BenchScale, current_profile(), current_scale() — env-driven knobs (FLUREE_BENCH_PROFILE, FLUREE_BENCH_SCALE, FLUREE_BENCH_RUNTIME).
gen::people / gen::vectors / gen::corpora — deterministic data generators lifted from the five existing benches. Tested for byte-stability across runs.
templates/BENCH_TEMPLATE.rs — working bench skeleton with TODO markers; the copy-this-and-edit starting point per the contributor guide.
budget, fixtures, report — regression-budget loader (check() helper), fixture path resolution, opt-in markdown-style end-of-run summary tables.

Plus three new docs: workspace-root BENCHMARKING.md (orientation), docs/contributing/benches.md (deep contributor guide mirroring tracing-guide.md), fluree-bench-support/README.md (API reference).

The five existing benches (insert_formats, vector_query, fulltext_query, vector_math, spatial_bench) are retrofitted to use the chassis. Mostly deletions: dropped init_tracing_for_bench × 3, LEDGER_COUNTER × 3, inline generators × 4. Behavior on Large scale is byte-identical to pre-retrofit.

`58d08c931` — `import_bulk`, `transact_commit`, `query_cold_reload`

Three new benches in fluree-db-api/benches/:

transact_commit.rs — single-commit latency on a fresh and a populated ledger. Distinct from insert_formats.rs, which measures total throughput across many txns; this measures the per-commit latency users see in production. Uses criterion::iter_batched so the base-load setup is excluded from timing.
import_bulk.rs — bulk Turtle import via fluree.create(id).import(path).execute(). Two scenarios (single_threaded, default_threads); throughput in triples/sec.
query_cold_reload.rs — file-backed cold reload + first query latency. Two scenarios (cold_load, cold_load_plus_query).

`c0489a890` — `reindex_full`, `reindex_incremental`, `novelty_replay`

Three more benches in fluree-db-api/benches/:

reindex_full.rs — Fluree::reindex(...) end-to-end, throughput in triples/sec.
reindex_incremental.rs — exercises the orchestrator's incremental path. Setup pre-builds an indexed baseline + delta commits sitting in novelty above it; measured op is Fluree::trigger_index(...).
novelty_replay.rs — cold reload with without_indexing() so the populate phase keeps everything in novelty. Scaled by commit count to exercise bulk_apply_commits at varying chain depth.

This commit also folds two new entries into docs/contributing/benches.md:

The iter_batched + tokio reactor gotcha caught while writing bench-2: criterion::iter_batched's setup closure runs synchronously, so FlureeBuilder::file(...).build() (which touches the storage backend) panics outside a block_on. Doc shows the canonical fix.
Workspace clippy lints apply to bench code — names the two that bit during validation (needless_raw_string_hashes, uninlined_format_args).

`aa6bcc3ee` — `gen::bsbm` generator + `query_hot_bsbm` bench

A deterministic BSBM-shape data generator (Vendor, Product, Person, Review) added to the chassis, plus a hot-cache SPARQL bench with three scenarios drawn from the BSBM query catalogue:

Q3-shape — multi-hop join + scalar range filter.
Q5-shape — multi-join with price-range filter and ORDER BY.
Q9-shape — group-by + count + HAVING.

Setup builds the dataset once per scale, populates a file-backed ledger, runs a full reindex (so the binary columnar index is in place), and reuses the resulting GraphSnapshot for all b.iter calls. The bench measures warm-cache binary scan, not novelty replay or load.

Deliberate deviation from plan: the plan suggested vendoring a 5 MB BSBM-1K Turtle file. The programmatic generator ships instead — keeps the repo small, scales naturally across BenchScale tiers, no external bsbmtools dependency, deterministic. For multi-million-triple scales we may eventually want bsbmtools-faithful distributions; that's a follow-up if nightly diverges.

`cb4916968` — bench-gate CI job + workspace reconcile test

The bench-gate job in .github/workflows/ci.yml. Two checks:

Reconcile — cargo test -p fluree-bench-support --test workspace_reconcile walks every workspace member's Cargo.toml for [[bench]] entries and confirms each has a matching entry in regression-budget.json (and vice versa). Fails with a clear error message naming the offending crate/bench pair. Catches: missing budgets, stale budget entries (deleted/renamed benches), unknown crate references.
Smoke — cargo bench --workspace -- --test runs each bench's scenarios once at tiny scale. Catches benches that compile but panic at runtime (bad SPARQL, broken setup, missing API surface).

Wall-clock budget: ~5–8 min after rust-cache warms; ~12–15 min cold.

This commit also bundles two scope additions caught by the smoke validation:

Three pre-existing benches gated by FLUREE_BENCH_SCALE — insert_formats, vector_query, fulltext_query had hardcoded size arrays from before bench-1's retrofit. The fulltext bench's 50k case stalled the smoke run. Each gains a small dataset_sizes() / matrix() helper that slices by scale. Large behavior is byte-identical to pre-retrofit.
Real bug caught in vector_query.rs — query VALUES blocks used the "@type": "@vector" alias, which the parser no-longer-accepts in query context (only INSERT). Fixed to use the full IRI "@type": "https://ns.flur.ee/db#embeddingVector", matching it_vector_flatrank.rs. Honest demonstration of the gate's value: the bench had been silently broken until the smoke run forced it.

BENCHMARKING.md gains a §"Regression budgets" rewrite documenting the two-phase gate model: bench-gate (per-PR, smoke + reconcile) and bench-nightly (separate workflow on cron, regression comparison against committed baselines).

Validation

Chassis tests: cargo test -p fluree-bench-support --lib — 44 passed, 0 failed.
Workspace reconcile: cargo test -p fluree-bench-support --test workspace_reconcile — passes. Verified to fail with clear messages by manually deleting a budget entry then restoring.
Full smoke run: FLUREE_BENCH_PROFILE=quick FLUREE_BENCH_SCALE=tiny cargo bench --workspace -- --test — all 12 bench files, every scenario, all report Success.
Workspace check: cargo check --workspace --benches — clean.

Two-phase gate model — what's done vs deferred

Done in this PR (bench-gate, runs per-PR):

Smoke validation — every bench compiles + runs at tiny scale.
Reconcile — every [[bench]] is registered in regression-budget.json, no stale entries, no unknown crates.
Behavior validation — the gate caught a real pre-existing bug in vector_query.rs during landing.

Deferred to bench-nightly:

bench-baselines.json schema + initial baselines committed from a few CI runs.
Cron-triggered nightly workflow that runs the Full profile across Medium / Large scales.
Regression comparison: observed_ns ≤ baseline_ns × (1 + budget_pct/100) per (crate, bench, scale).
GH-Pages dashboard rendered from bench-history/ JSON-Lines.

The reason for the split: ubuntu-latest shared runners flap enough that per-PR regression comparison would produce false positives. Nightly amortizes noise across the Full profile's larger sample counts and uses dedicated 4-core runners.

What this PR explicitly does NOT do

Per the bench-infrastructure plan §10 (out of scope) and one or two judgment calls during implementation:

Vendored BSBM Turtle fixture. Programmatic generator ships instead. Vendoring revisited if nightly large-scale runs need bsbmtools-faithful distributions.
iai-callgrind for noise-free PR gating. Plan §11 open question; defer until we have flap data from criterion-on-ubuntu-latest.
Auto-generated bench tables in BENCHMARKING.md — manual table is fine; auto-generation is mechanical follow-up if it becomes painful.
Real-world dataset benches (e.g., dblp). Programmatic gen::people and gen::bsbm cover the shapes the upcoming RFCs need; real-world data is a useful complement, not a blocker.
Regression-comparison enforcement — see "Deferred" above.

Files & line count

38 files changed, 4473 insertions, 413 deletions.
New crate: fluree-bench-support (~1500 LoC, 44 unit tests + 1 integration test).
New benches: 7 (import_bulk, transact_commit, query_cold_reload, reindex_full, reindex_incremental, novelty_replay, query_hot_bsbm).
Retrofitted benches: 5 (insert_formats, vector_query, fulltext_query, vector_math, spatial_bench).
New docs: BENCHMARKING.md (~150 lines), docs/contributing/benches.md (~440 lines after gotcha additions), fluree-bench-support/README.md (~110 lines).
New CI artifact: bench-gate job (~50 lines of YAML).
New tracker: regression-budget.json (workspace root).

Follow-ups

bench-nightly — separate PR. Cron workflow + baselines + regression-comparison enforcement + dashboard.
Coverage gaps from PR review — 8 perf-regression-prone hot paths flagged during review are tracked under [bench] Post-PR-#1228 coverage gaps (parent tracker) #1235 (parent tracker referencing 6 sub-issues grouped by which can land together as one PR). Highest-priority is the property-join planner regression bench ([bench] Property-join planner regression bench (analyze_property_join_plan boundaries) #1229; ~176×–933× recent wins); next is the lex-sorted-string ORDER BY fast-path bench ([bench] Lex-sorted-string ORDER BY fast-path regression bench #1230; 200× speedup that depends on a silent-regression-prone invariant). None blocks this PR.
Operator-side simplifications unlocked by the four upcoming RFCs (NamespaceUniverse, OverlaySubstrate, LedgerStateAdvance) — tracked in the post-streamline cleanup doc.
vector_math.rs post-streamline-query move — its import of fluree_db_query::expression::vector_math will need re-pointing to eval::vector_math once the colleague's refactor/streamline-query stack merges. Trivial one-line edit; not blocking.
Real-world dataset generator (e.g., dblp) — clean follow-up if a future bench needs distributions the synthetic generators miss.

Lands the foundation of the workspace's new bench infrastructure: a shared chassis crate, three contributor docs, an initial regression-budget schema, and a clean retrofit of all five pre-existing benches onto the new helpers. No behavior change to any bench; existing scenarios produce byte-identical output (chassis tests pin the determinism contract on every lifted generator). This is the first commit on `feature/stable-bench`. Subsequent commits on this branch will add new benches against currently-uncovered hot paths (bench-2: `import_bulk`, `transact_commit`, `query_cold_reload`; bench-3: reindex/incremental/novelty-replay; bench-4: BSBM-shape query + vendored fixture; bench-5: gated CI job + initial baselines). The nightly workflow + remote fixture host land separately as `bench-nightly`. ## What's new - `fluree-bench-support/` — new internal crate, `dev-dependency`-only. Eleven modules (~1,500 LoC), 37 unit tests. Public surface: - `init_tracing_for_bench()` — opt-in stderr subscriber under `FLUREE_BENCH_TRACING=1`. Replaces an 18-line block previously duplicated verbatim in three benches. - `next_ledger_alias(prefix)` — atomic, never-reused alias of the form `bench/{prefix}-{n}:main`. Replaces three independent `LEDGER_COUNTER: AtomicU64` patterns. - `bench_runtime()`, `BenchProfile`, `BenchScale`, `current_profile()`, `current_scale()` — env-driven knobs (`FLUREE_BENCH_PROFILE`, `FLUREE_BENCH_SCALE`, `FLUREE_BENCH_RUNTIME`). - `gen::people` — Person/Company graph generator + JSON-LD/Turtle serializers (lifted from `insert_formats.rs`). - `gen::vectors` — `f64` vector generators, both deterministic-from- seed and RNG-driven (lifted from `vector_math.rs` and `vector_query.rs`). - `gen::corpora` — paragraph templates + `random_paragraph()` (lifted from `fulltext_query.rs`). - `budget` — `RegressionBudget` schema + loader for `regression-budget.json` + `check()` helper. Reconciler stub for bench-5. - `fixtures` — workspace-root `fluree-bench-support/fixtures/` resolution. `load_or_generate()` body lands in bench-4. - `report` — opt-in markdown-style end-of-run summary tables. - `tracing::BenchSpanLayer` — skeleton for span-capture-to-file; full impl in a follow-up. - `templates/BENCH_TEMPLATE.rs` — working bench skeleton with TODO markers; copy-this-and-edit starting point per the contributor guide. - `BENCHMARKING.md` (workspace root) — orientation: what benches exist, how to run, env vars, output format, regression budgets. README points at it. - `docs/contributing/benches.md` — deep contributor guide mirroring `tracing-guide.md`'s structure. Six-step workflow for adding a bench, category conventions, common patterns, gotchas, debugging a flaky bench, span-capture instructions, review checklist. - `regression-budget.json` (workspace root) — per-bench, per-scale percentage regression CI accepts. Schema-defining initial values; real baselines land with the first nightly run. ## Retrofits Every existing bench compiles green and runs `--test` green after the substitution. The chassis tests pin byte-identical output for every lifted generator. - `fluree-db-api/benches/insert_formats.rs` — drops the 18-line `init_tracing_for_bench`, the `LEDGER_COUNTER: AtomicU64` static, the `PersonData/CompanyData/TxnData` types, and three generator functions (`generate_txn_data`, `txn_data_to_jsonld`, `txn_data_to_turtle`) in favor of chassis imports. Four `LEDGER_COUNTER.fetch_add(...)` call sites become `next_ledger_alias("...")`. - `fluree-db-api/benches/vector_query.rs` — drops the duplicated `init_tracing_for_bench` and the local `random_vector(rng, dim)`; imports `gen::vectors::rng_one as random_vector` (same signature, byte-identical output). - `fluree-db-api/benches/fulltext_query.rs` — drops the duplicated `init_tracing_for_bench`, the inline `PARAGRAPH_TEMPLATES` and `EXTRA_VOCAB` constants (~120 lines), and the `random_paragraph` function. Imports `gen::corpora::random_paragraph` (same signature). - `fluree-db-query/benches/vector_math.rs` — drops the local `random_vectors(dim)`; imports `gen::vectors::hashed_pair as random_vectors`. No tracing/ledger surface to retrofit (pure math). - `fluree-db-spatial/benches/spatial_bench.rs` — chassis dev-dep wired via `use fluree_bench_support as _;` so future spatial benches can opt in. Spatial-domain geometry generators stay co-located (not yet reused elsewhere). ## Wiring - Workspace `Cargo.toml` adds `fluree-bench-support` as a member. - `fluree-db-api`, `fluree-db-query`, `fluree-db-spatial` each gain `fluree-bench-support` in `[dev-dependencies]`. - Workspace `README.md` adds one line under "Documentation" pointing at `BENCHMARKING.md`. ## Verification - `cargo test -p fluree-bench-support --lib` — 37 passed - `cargo check --workspace --benches` — clean - `cargo clippy -p fluree-bench-support --lib --tests` — clean - `cargo bench -p fluree-db-query --bench vector_math -- --test` — 21 scenarios all `Success` - Each retrofitted bench compiles in `--release` (`cargo bench --no-run`) ## Not yet done (deferred to follow-up commits / PRs) - `validate_against_workspace()` — full Cargo-toml ↔ budget-JSON reconciler. Stub today; lands with bench-5 (CI gate). - `BenchSpanLayer` — file-mode tracing (`FLUREE_BENCH_TRACING=file:./out.json`). Today this falls back to stderr with a `tracing::warn!`. - `fixtures::load_or_generate` — body. Today returns a `FixtureRef` placeholder. Vendored data lands in bench-4; remote fetch in bench-nightly. - BSBM-shape and other new benches (bench-2 through bench-4 commits on this branch). - `.github/workflows/ci.yml` gated bench job (bench-5 commit). - Nightly workflow + dashboard (`bench-nightly` PR, separate). - `vector_math.rs` import path will need re-pointing from `fluree_db_query::expression::vector_math` to `eval::vector_math` once `refactor/streamline-query`'s stacked work merges. Trivial one-line follow-up. ## Plan reference Designed per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §4.5 (Extensibility and contributor onboarding) and §8 (Migration plan). The chassis design absorbed the survey findings in §2.1 ("Concrete patterns observed").

Three new benches in `fluree-db-api/benches/`, each exercising a hot path the existing five benches don't cover. All build on the `fluree-bench-support` chassis introduced in the previous commit, which keeps each bench file under ~200 lines of bench-specific code. This is the second commit on `feature/stable-bench`, landing the bench-2 slice of the bench-foundation PR. Subsequent commits add reindex / incremental-index / novelty-replay benches (bench-3), the BSBM-shape hot-query bench plus vendored fixture (bench-4), and the gated CI job plus initial baselines (bench-5). ## What's new ### `fluree-db-api/benches/transact_commit.rs` (196 lines) Single-commit latency on a fresh and a populated ledger. Distinct from `insert_formats.rs`, which measures total throughput across many txns; `transact_commit` measures the per-commit latency that most users see in production. - Scenario 1: `fresh_ledger` — commit one small txn against a freshly-created ledger. Measures pure commit overhead. - Scenario 2: `populated_ledger` — commit one small txn against a ledger pre-loaded with `base_nodes` of history. Skipped when `base_nodes == 0`. - Scale-driven inputs: Tiny=100×10, Small=1k×10, Medium=10k×10, Large=100k×10 (base nodes × commit nodes). - Uses `criterion::iter_batched` so the base-load setup is excluded from the timed measurement. - Memory-backed Fluree (`FlureeBuilder::memory().build_memory()`) — no I/O confound. ### `fluree-db-api/benches/import_bulk.rs` (193 lines) Bulk Turtle import via `fluree.create(id).import(path).execute()`. The hot path under the hood: Turtle streaming parse → chunked staging → root assembly → FIR6 root publish. - Scenario 1: `single_threaded` — `threads(1)`. Baseline that doesn't confound with parallelism overhead. - Scenario 2: `default_threads` — exercises parallel-import allocator and worker-cache. Skipped at Tiny scale where parallel overhead would dominate. - Scale: Tiny=1k → Large=200k nodes, ~4-7× as triples. - Throughput annotation in `Throughput::Elements(triples)` so criterion's `thrpt` line reads in triples/sec. - File-backed Fluree (`FlureeBuilder::file(...)`) with fresh tempdirs per iteration; `iter_batched` setup wraps the builder construction in the bench's tokio runtime (`rt.block_on(...)`) because the file-backed builder requires a running reactor. ### `fluree-db-api/benches/query_cold_reload.rs` (191 lines) Cold ledger reload latency. Pre-populate a file-backed ledger, drop the Fluree connection, then time the rebuild when a fresh handle opens the same ledger. - Scenario 1: `cold_load` — just `fluree.graph(id).load()`. Measures the load path: storage read → snapshot decode → novelty replay → binary-store attach. - Scenario 2: `cold_load_plus_query` — load + one SPARQL query. Captures the full "I restarted my application, time to first answer" user-visible latency. - Scale: Tiny=200 → Large=50k base nodes. Single populating txn so scaling is purely "amount of data," not "depth of commit chain." - Uses `iter_batched` to put populate-and-drop in setup and measure only the cold open. ## Wiring - `fluree-db-api/Cargo.toml` — three new `[[bench]]` entries. - `regression-budget.json` — three new `fluree-db-api.*` entries with placeholder budgets (`tiny: 10%, small: 5%, medium: 5%`); real baselines land with the first nightly run (bench-nightly PR). No new dev-deps; `tempfile` and the chassis are already on `fluree-db-api`. ## Verification - `cargo check -p fluree-db-api --benches` — clean - `cargo bench --no-run` for all three — clean release build - `cargo bench -p fluree-db-api --bench transact_commit -- --test` — both scenarios `Success` at small + tiny - `cargo bench -p fluree-db-api --bench import_bulk -- --test` — both scenarios `Success` at small; `single_threaded` only at tiny - `cargo bench -p fluree-db-api --bench query_cold_reload -- --test` — both scenarios `Success` at small + tiny - `cargo test -p fluree-bench-support --lib` — 37 passed (chassis contracts unbroken) ## Gotcha worth flagging for future bench authors `FlureeBuilder::file(...).build()` — and any other API that touches the file-backed storage path — requires a running tokio reactor. With `criterion::iter_batched`, the `setup` closure runs synchronously outside any `block_on`, so a setup that calls `FlureeBuilder::file(...)` panics with "there is no reactor running, must be called from the context of a Tokio 1.x runtime." Fix: wrap setup work that touches the file backend in `rt.block_on(async { ... })`. The memory-backed builder (`FlureeBuilder::memory().build_memory()`) doesn't have this constraint and works fine in synchronous setup. This will be added to `docs/contributing/benches.md` as a "Gotchas" entry in a follow-up commit on this branch. ## Plan reference Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2 (`bench-2`).

Three new benches in `fluree-db-api/benches/`, each exercising a hot path that bench-1 and bench-2 didn't cover: full reindex from the commit chain, incremental indexing via the orchestrator, and cold reload through deep novelty. The contributor guide grows two new "Gotchas" entries — the tokio-reactor-in-`iter_batched`-setup pitfall caught during bench-2, and a note about which workspace clippy lints catch bench-side mistakes. This is the third commit on `feature/stable-bench`, landing the bench-3 slice of the bench-foundation PR. Subsequent commits add the BSBM-shape hot-query bench plus vendored fixture (bench-4) and the gated CI job plus initial baselines (bench-5). ## What's new ### `fluree-db-api/benches/reindex_full.rs` (145 lines) End-to-end `fluree.reindex(id, ReindexOptions::default())` against a file-backed ledger pre-populated in a single txn. The hot path under the hood: commit-chain replay → flake collection → binary columnar index build (FLI3 leaves, FBR3 branches) → FIR6 root publish. - Scenario: `single_txn` — base data committed in one txn so the measurement scales with "amount of data," not "depth of commit chain." - Scale: Tiny=200 → Large=50k base nodes; throughput in triples/sec. - File-backed Fluree per iteration; `IndexConfig` with high reindex thresholds during populate so background indexing doesn't race the measured op. ### `fluree-db-api/benches/reindex_incremental.rs` (175 lines) Orchestrator-driven incremental indexing. Setup brings the ledger to a state where an index already covers `base_nodes` of data and `delta_commits` worth of additional commits sit in novelty above it; the measured op is the `Fluree::trigger_index(...)` call that drives the orchestrator to extend the index over those novelty commits. - Scenario: `apply_delta` — exercises the orchestrator's incremental path against novelty above an indexed base. - Scale: (Tiny=200×5, Small=2k×20, Medium=10k×50, Large=50k×200) for (base_nodes × delta_commits). Each delta commit is 10 nodes, so novelty depth scales with `delta_commits`. - Setup uses `Fluree::reindex` to establish the baseline index, then `Fluree::ledger(id)` to reload state with the new index head before applying the delta commits. ### `fluree-db-api/benches/novelty_replay.rs` (145 lines) Cold reload latency under deep novelty with **no index attached**. Distinct from `query_cold_reload.rs`, which scales by amount of data committed in one txn; this scales by **commit-chain depth**. The measured cold reload exercises `fluree-db-novelty::Novelty::bulk_apply_commits` (memory: `mem:fact-01kqfy6txdrjppaf6756xzdz25`) plus per-commit envelope-delta application. - Setup uses `FlureeBuilder::file(...).without_indexing().build()` so every populate-phase commit stays in novelty. - Scale: Tiny=20 → Large=2000 commits, 10 nodes each. - Scenario: `replay_chain` — single cold reload that replays the full commit chain into the in-memory novelty store. ## Doc update folded in `docs/contributing/benches.md` grows two Gotchas entries: 1. **`iter_batched` setup needs a tokio reactor for file-backed Fluree.** Caught while validating bench-2: `criterion::iter_batched`'s setup closure runs synchronously, so `FlureeBuilder::file(...).build()` — which touches the storage backend during construction — panics with "there is no reactor running, must be called from the context of a Tokio 1.x runtime." The doc shows the canonical fix (wrap setup in `rt.block_on(async { ... })`) and notes that `FlureeBuilder::memory().build_memory()` doesn't have this constraint. All three new benches in this commit follow the wrapped-setup pattern. 2. **Workspace clippy lints apply to bench code.** The workspace `Cargo.toml`'s `[workspace.lints.clippy]` denies several lints; two that bit during bench-2 are called out by name — `needless_raw_string_hashes` (write `r"..."` not `r#"..."#`) and `uninlined_format_args` (write `format!("{x}")` not `format!("{}", x)`). Plus a one-liner about running `cargo clippy --benches` locally before pushing. ## Wiring - `fluree-db-api/Cargo.toml` — three new `[[bench]]` entries. - `regression-budget.json` — three new `fluree-db-api.*` entries with placeholder budgets (`tiny: 10%, small: 5%, medium: 5%`); real baselines land with the first nightly run (bench-nightly PR). No new dev-deps; the chassis and `tempfile` are already on `fluree-db-api`. ## Verification - `cargo check -p fluree-db-api --benches` — clean - `cargo bench --no-run` for all three — clean release build - `cargo bench -p fluree-db-api --bench reindex_full -- --test` — scenario `Success` - `cargo bench -p fluree-db-api --bench novelty_replay -- --test` — scenario `Success` - `cargo bench -p fluree-db-api --bench reindex_incremental -- --test` — scenario `Success` - `cargo test -p fluree-bench-support --lib` — 37 passed (chassis contracts unbroken) - `cargo check --workspace --benches` — clean ## Plan reference Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2 (`bench-3`). The plan suggested putting reindex benches in `fluree-db-indexer/benches/` and novelty_replay in `fluree-db-novelty/benches/`; in practice the user-facing entry points (`Fluree::reindex`, `Fluree::trigger_index`, `fluree.graph(id).load()`) all live in `fluree-db-api`, so the benches naturally belong there. A future commit can add focused micro-benches in the indexer/novelty crates if the end-to-end versions need a finer breakdown.

A new bench in `fluree-db-api/benches/` exercising warm-cache SPARQL query latency on a BSBM-shape graph, plus a deterministic BSBM-shape data generator added to the chassis. Three query scenarios cover three distinct planner / scan patterns (multi-hop filter, multi-join with range filter + ORDER BY, group-by + count + HAVING). The dataset is generated on the fly rather than vendored as a 5 MB Turtle file. This is the fourth commit on `feature/stable-bench`, landing the bench-4 slice of the bench-foundation PR. The remaining commit (`bench-5`) adds the gated CI job plus the initial regression baselines. ## What's new ### `fluree-bench-support/src/gen/bsbm.rs` (300 lines) Deterministic generator for a four-entity graph drawn from the [Berlin SPARQL Benchmark](http://wbsg.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/): `Vendor`, `Product`, `Person`, `Review`. - Counts derive from `n_products`: `n_vendors = n/50`, `n_persons = n/10`, `n_reviews = n*3`. - Five product types and five countries cycled deterministically. - Prices in `1_000..=51_000` cents (range chosen so range filters see meaningful selectivity). - Ratings cycle `1..=5` across reviews so HAVING / ORDER BY scenarios have genuine variation. - `bsbm_data_to_turtle(&data)` renders the graph as a single well-formed Turtle document with `ex:`, `bsbm:`, and `xsd:` prefixes. - 7 unit tests pin the determinism contract (chassis suite grows from 37 to 44 passing tests). ### `fluree-db-api/benches/query_hot_bsbm.rs` (230 lines) Three SPARQL scenarios against a populated, indexed file-backed Fluree: - **Q3-shape** — multi-hop join + scalar range filter (Electronics products with reviews rated ≥ 4). - **Q5-shape** — multi-join with price-range filter and `ORDER BY` (top-10 products in 5000–25000 cents, ordered by price, with vendor label). - **Q9-shape** — `GROUP BY` + `COUNT` + `HAVING` (products with ≥ 3 reviews, ordered by review count). Setup builds the dataset once per scale, populates the ledger, runs a full reindex (so the binary columnar index is in place), and loads a `GraphSnapshot`. All three scenarios reuse that snapshot — the bench measures **warm-cache binary scan**, not novelty replay or load overhead. Scale: `Tiny=100 → Large=100k` products. Other entity counts derive. ## Why programmatic (not vendored) The bench plan suggested vendoring the canonical 5 MB BSBM-1K Turtle file under `fluree-bench-support/fixtures/`. This commit takes the programmatic path instead: - A binary-ish 5 MB blob in git inflates clone / fetch footprint for every contributor and every CI build, indefinitely. - Programmatic generation is deterministic (chassis tests pin the contract) and scale-parameterized (one knob, four `BenchScale` tiers). - Avoids a build-time dependency on [`bsbmtools`](https://github.com/wbsg-uni-mannheim/bsbmtools) or any external generator script. For multi-million-triple scales (Large+ in nightly), we may still want the canonical generator's distributions — that's a follow-up if a nightly run discovers the synthetic shape diverges enough from real-world BSBM behavior to matter. The plan's `fluree-bench-support/scripts/gen-bsbm.sh` external generator is therefore not added in this commit. The bench plan doc should be updated to reflect the programmatic-first approach in a follow-up cleanup commit. ## Wiring - `fluree-bench-support/src/gen/mod.rs` — `pub mod bsbm;` plus a one-paragraph entry in the module docstring. - `fluree-db-api/Cargo.toml` — new `[[bench]] name = "query_hot_bsbm"` entry. - `regression-budget.json` — new `query_hot_bsbm` entry with placeholder budgets (`tiny: 10%, small: 5%, medium: 3%`). The medium budget is tighter because hot-path query latency should be the most stable signal in the bench suite. No new dev-deps; the chassis is already on `fluree-db-api`. ## Verification - `cargo test -p fluree-bench-support --lib` — **44 passed** (up from 37; the 7 new tests cover bsbm determinism, count ratios, type / country / rating distribution, and Turtle prefix presence). - `cargo check -p fluree-db-api --bench query_hot_bsbm` — clean. - `cargo bench --no-run` — clean release build. - `FLUREE_BENCH_SCALE=tiny cargo bench -p fluree-db-api --bench query_hot_bsbm -- --test` — `q3/q5/q9` all report `Success`. - `cargo check --workspace --benches` — clean. ## Plan reference Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2 (`bench-4`). Two deliberate deviations from the plan worth flagging for review: 1. **Programmatic generator over vendored fixture** (see "Why programmatic" above). The plan's intent — let benches scale across `BenchScale` tiers without external tooling — is preserved; the delivery vehicle is different. 2. **No `gen-bsbm.sh` external script** at this stage. If we later need bsbmtools-faithful distributions for nightly large-scale runs, that's a separate concern. A future `bench-N` commit (or a follow-up after this branch lands) could add a parallel generator wired to the in-tree dblp data the CLI's bulk-import work already uses, since real-world graph distributions exercise shapes the synthetic generator may miss (degree skew, prefix variety, etc.). Tracked as an out-of-scope follow-up, not blocked by this PR.

@type

A new `bench-gate` job in `.github/workflows/ci.yml` runs on every PR and push to `main`. Two checks: 1. **Reconcile** — `cargo test -p fluree-bench-support --test workspace_reconcile` walks every workspace member's `Cargo.toml` for `[[bench]]` entries and confirms each has a matching entry in `regression-budget.json` (and vice versa). Fails the gate with a clear error message naming the offending `crate/bench` pair. 2. **Smoke** — `cargo bench --workspace -- --test` runs each bench's scenarios once at `tiny` scale (`FLUREE_BENCH_PROFILE=quick`, `FLUREE_BENCH_SCALE=tiny`). Catches benches that compile but panic at runtime (bad SPARQL, broken setup, missing API surface). This is the fifth and final commit on `feature/stable-bench`, completing the bench-foundation PR. The full regression-comparison phase (which compares observed nanoseconds against committed baselines and a per-bench budget percentage) lands separately in `bench-nightly` because (a) it needs runner-stable baselines that only emerge from a few nightly runs and (b) per-PR comparison on shared `ubuntu-latest` runners would flap. ## What's new ### `.github/workflows/ci.yml` — `bench-gate` job - Same setup pattern as the existing `clippy` / `test` jobs: free disk space → checkout → toolchain → rust-cache. - Three steps: reconcile, build, smoke. Reconcile runs first because it's fast (~1s) and tells the contributor exactly what's wrong before they wait on a release build. - Wall-clock budget: ~5–8 min after rust-cache warms; ~12–15 min cold. Within plan §6.1's CI minute budget. ### `fluree-bench-support/tests/workspace_reconcile.rs` (~145 lines) - Walks workspace members, parses each crate's `Cargo.toml` for `[[bench]]` entries, builds a `crate -> [bench]` map. - Loads `regression-budget.json` and reconciles bidirectionally: - **Missing budget** — declared bench without a budget entry. Fails with "add an entry under `crates.<crate>.<bench>` in `regression-budget.json`." - **Stale budget** — budget entry without a matching `[[bench]]`. Fails with "remove the entry or rename the bench file." - **Unknown crate** — budget references a crate not in the workspace. - Uses `toml = "0.8"` as a dev-dep (test-only; doesn't pull `toml` into anyone who depends on the chassis library). ### `BENCHMARKING.md` — §"Regression budgets" rewritten - Documents the two-phase gate model (`bench-gate` per-PR, `bench-nightly` separate workflow on cron) explicitly. - Explains why two phases — `ubuntu-latest` flap makes per-PR regression comparison unreliable; the nightly amortizes noise across `Full`-profile sample counts and uses 4-core runners. ## Pre-existing benches retrofitted in this commit Validating the smoke run end-to-end uncovered two issues in pre-existing benches that needed fixing for the gate to actually land green. Both are scoped to this commit because the gate is useless if the smoke fails on existing scenarios. ### Three benches now respect `FLUREE_BENCH_SCALE` `insert_formats.rs`, `vector_query.rs`, and `fulltext_query.rs` had hardcoded size arrays from before bench-1's chassis retrofit: - `insert_formats.rs` — `TXN_COUNTS = &[10, 100]`, `NODES_PER_TXN = &[10, 100, 1000]`. Six matrix cells. - `vector_query.rs` — `DATASET_SIZES = &[1_000, 5_000]`. - `fulltext_query.rs` — `DATASET_SIZES = &[1_000, 5_000, 10_000, 50_000]`. The fulltext bench's 50k case stalled the smoke run. Each file gains a small scale-driven slice helper (`dataset_sizes()` / `matrix()`) that returns the appropriate subset for `current_scale()`. At `Tiny`: smallest size only. At `Small`: 1–2 sizes. At `Medium`: 2–3 sizes. At `Large`: full curve. Behavior at the `Large` scale is byte-identical to pre-retrofit; the helpers just slice when smaller. ### Real bug caught: `vector_query.rs` SPARQL VALUES type alias Smoke validation panicked with: ``` Array @value is only supported for https://ns.flur.ee/db#embeddingVector typed literals ``` …because the bench was constructing query VALUES blocks with `"@type": "@vector"`. The `@vector` alias is INSERT-only; the query parser requires the full IRI in VALUES context (canonical pattern in `it_vector_flatrank.rs`). Two query literals updated to use `"@type": "https://ns.flur.ee/db#embeddingVector"`. This bench evidently hadn't run cleanly in some time; the gate is the reason it's running cleanly now. ## Verification - `cargo test -p fluree-bench-support --test workspace_reconcile` — passes (1 test). - Verified the reconcile test fails correctly when a budget entry is removed (manual: deleted `query_hot_bsbm` entry, ran test, got "Missing budget entries: fluree-db-api/query_hot_bsbm" with the fix-it suggestion; restored entry, test passes again). - `FLUREE_BENCH_PROFILE=quick FLUREE_BENCH_SCALE=tiny cargo bench --workspace -- --test` — **all 12 bench files run; every scenario reports `Success`**: - `insert_formats`, `vector_query`, `fulltext_query`, `vector_math`, `spatial_bench` (existing, retrofitted in bench-1). - `import_bulk`, `transact_commit`, `query_cold_reload` (bench-2). - `reindex_full`, `reindex_incremental`, `novelty_replay` (bench-3). - `query_hot_bsbm` (bench-4). - `cargo check --workspace --benches` — clean. ## Out of scope (lands separately) - **Regression-comparison phase** — `bench-nightly` PR. Adds a `bench-baselines.json` schema, a cron-triggered nightly workflow that runs the `Full` profile across `Medium`/`Large` scales, and the budget-check logic. Initial baselines come from the first 2–3 nightly runs; until those exist, regression-comparison enforcement cannot be turned on. - **iai-callgrind for noise-free PR gating** — open question §11.3 in the bench-infrastructure plan. Defer until we have flap data from criterion-on-`ubuntu-latest`. - **Auto-generated "Current benches" table** in `BENCHMARKING.md` — hand-maintained for now; auto-generation from workspace `[[bench]]` declarations is mechanical but not yet worth the build-time cost. ## Plan reference Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2 (`bench-5`). The plan also called for capturing initial baselines as part of this commit; that's been deferred to `bench-nightly` because baselines from a developer's machine don't translate to the CI runner anyway. The bench-gate phase implemented here gives the workspace its smoke-and-reconcile coverage immediately; regression-comparison is a separate concern.

…fit chassis consistency, sweep doc drift Single commit addressing all 17 inline review findings on PR #1228 plus deletion of the misleading `validate_against_workspace()` stub. No behavior change to the bench-gate CI job or any landed bench's intent — every fix is either a correctness restoration, a chassis-coherence sweep across pre-existing benches, or doc-drift cleanup. All 44 chassis unit tests + 1 reconcile integration test still pass; full smoke at `tiny` scale shows all 24 scenarios `Success`. Coverage gaps the reviewer flagged in a separate "follow-up" section are tracked as GH issues #1229–#1234 with parent #1235; this commit does not add new benches. ## Major (M1–M5) - **M1 — `gen::vectors::hashed_pair` determinism contract restored.** The pre-retrofit `vector_math.rs::random_vectors` used a _single shared_ `DefaultHasher` whose state accumulated across the loop; my chassis version constructed a fresh hasher per call, which produced different bytes for the same `dim`. The bench-1 commit's "byte-identical" claim was technically false for this generator. Rewrote `hashed_pair` to mirror the pre-retrofit shared-hasher behavior verbatim. Output is now byte-identical to the pre-chassis bench, restoring the determinism contract documented in the chassis docstring. - **M2 — `BENCHMARKING.md` "Current benches" table updated.** Was stale at 5 rows from before bench-2/3/4. Now includes all 12 bench files (5 retrofitted + 7 new) so contributors landing on the workspace doc from `README.md` see what exists. - **M3 — `fluree-bench-support/README.md` four edits.** `gen::bsbm` added to the table. Stale references to "lands in bench-4" / "lands in bench-5" replaced with accurate "stub today" / "implemented at the test level" notes. Test count corrected from 37 to 44. - **M4 — `docs/contributing/benches.md` "Future work" section rewrite.** Was claiming bench-4 / bench-5 work as upcoming; both landed in PR #1228. Replaced with the reviewer's accurate suggested wording. - **M5 — `docs/contributing/benches.md` "Current categories" table fixed.** Was pointing at directories that don't exist (`fluree-db-indexer/benches/`, `fluree-db-novelty/benches/`, `fluree-db-core/benches/`). Updated to point at the actual files (all under `fluree-db-api/benches/`) per the PR's deviation-from-plan, with a note explaining where future micro-benches in those crates would live and a "Reserved categories" sub-table for `core` and `query` (not yet realized). ## Moderate (m1–m6) - **m1 — Retrofitted-bench chassis consistency.** The 5 pre-existing benches were using `tokio::runtime::Runtime::new()` directly and hardcoding `group.sample_size(10)`. Swapped to `bench_runtime()` (no-op behavior change; both are single-threaded current-thread by default) and `current_profile().sample_size()` (Quick still resolves to 10; `Full` profile now correctly yields a wider distribution). The `vector_math.rs` and `spatial_bench.rs` benches don't use Runtime or sample_size knobs and were untouched. - **m2 — Dropped unused `sha2` dev-dep** from `fluree-bench-support/Cargo.toml`. The "Lockfile-driven hashing for deterministic fixtures" comment was aspirational; no code uses it today. Will be re-added in the same commit that introduces fixture hashing. - **m3 — Hardcoded `LEDGER_ID` consts replaced** with `next_ledger_alias` calls in 6 benches (`query_cold_reload`, `novelty_replay`, `reindex_full`, `reindex_incremental`, `query_hot_bsbm`, `import_bulk`). Practically safe before because each iteration rebuilds Fluree over a fresh tempdir — but the contributor doc explicitly tells future authors _not_ to do this, so the new benches becoming the canonical example was a coherence problem. Now consistent with `transact_commit` (which already used the chassis pattern). Setup closures now thread the alias through `iter_batched`'s tuple input to the measured op. - **m4 — `BenchProfile::Full::sample_size()` bumped from 30 to 100** (criterion's default). 30 was below criterion's default, which defeats the "wider distribution for nightly stability" goal. Comment notes that the value is a starting point; we may need to bump it to 200+ once `bench-nightly` lands and we have flap data. `Quick` stays at 10. - **m5 — `import_bulk.rs` `default_threads` scenario docstring.** Added a note explaining that the chassis's `bench_runtime()` is single-threaded by default, so `FLUREE_BENCH_RUNTIME=multi` is required to get a meaningful parallel-import measurement. Otherwise the comparison to `single_threaded` should be read as "internal worker pool only," not full end-to-end parallel throughput. - **m6 — `transact_commit.rs` Fluree construction moved into `iter_batched` setup.** Previously one in-memory Fluree was constructed before the bench groups and shared across all iterations of both scenarios; over a `Full`-profile run that meant ~60+ accumulated ledgers in one in-memory state, which could skew later samples through allocator behavior. Setup is excluded from timing, so the cost of moving the construction is fine. ## Minor (µ1–µ6) - **µ1 — `bsbm.rs` price-range comment fixed.** Was "[10.00, 510.00] cents (i.e., $0.10 to $5.10)"; actual range is 1000–50999 cents (i.e., $10.00 to $509.99). The math was wrong, not the algorithm. - **µ2 — Q5 in `query_hot_bsbm.rs` unused `xsd:` prefix dropped.** Cosmetic. - **µ3 — `tracing.rs` `TODO(bench-3)` comment updated.** Bench-3 landed; reference now points at "the bench-nightly follow-up work." - **µ4 — `BenchSpanLayer` doc comment updated.** Was claiming "lands in a later commit (see plan §5.2 item 1)"; now accurately notes "wiring up the JSON-file emit path is tracked under the `bench-nightly` follow-up." - **µ5 — `BENCH_TEMPLATE.rs` `let _ = profile;` dead line removed.** `profile` is in fact used at `group.sample_size(profile.sample_size())`; the `let _ = profile;` was leftover from an earlier draft. - **µ6 — Bench-gate CI job gains `timeout-minutes: 25`.** Caps the entire job so a hung bench fails fast instead of tying up the runner queue. 25 minutes is well above the expected 5–8 min wall-clock budget but well below GitHub Actions' default per-step timeout. ## Stub deletion `validate_against_workspace()` removed from `fluree-bench-support/src/budget.rs`. The function was a stub whose docstring promised "lands in bench-5" — but bench-5 landed the reconcile as an integration test (`tests/workspace_reconcile.rs`) rather than as a library function. The stub was misleading: a caller would expect it to actually validate; instead it returned `Ok(())` unconditionally. Replaced the stub with a comment noting that the test is the contract. ## Verification - `cargo test -p fluree-bench-support --lib` — **44 passed** (unchanged; M1's behavior change to `hashed_pair` is byte-identical to the pre-retrofit version, so the existing determinism + range tests still pass). - `cargo test -p fluree-bench-support --test workspace_reconcile` — **1 passed**. Verified to fail correctly when a budget entry is removed (manual: deleted `query_hot_bsbm`, ran test, got "Missing budget entries: fluree-db-api/query_hot_bsbm" with the fix-it suggestion; restored entry, test passes). - `cargo check --workspace --benches` — clean. - `FLUREE_BENCH_PROFILE=quick FLUREE_BENCH_SCALE=tiny cargo bench --workspace -- --test` — **all 24 bench scenarios across 12 bench files report `Success`.** - `python3 -c "import yaml; yaml.safe_load(...)"` on `.github/workflows/ci.yml` — clean. ## Coverage gaps tracked separately The reviewer's "Suggested coverage gaps" section listed 8 perf-regression-prone hot paths with no dedicated bench. None blocks PR #1228 (the foundation), but several have known recent perf wins that warrant tracking. Spun out as GH issues: - **#1229** — Property-join planner regression bench (highest-priority follow-up). - **#1230** — Lex-sorted-string ORDER BY fast-path bench. - **#1231** — Scan fast-path regression benches: batched object-join + group-count-firsts. - **#1232** — Filter `PreparedBoolExpression` cache regression bench. - **#1233** — Time-travel and multi-ledger `DataSetDb` history query coverage. - **#1234** — Parser micro-benches (Turtle, JSON-LD, SPARQL). - **#1235** — Parent tracker referencing all 6.

zonotope

🧑‍🔬

aaj3f added 7 commits May 8, 2026 09:29

cargo fmt; clippy fix

1959a81

cargo fmt

d6b202b

aaj3f added 4 commits May 11, 2026 09:17

cargo fmt & clippy

795d70d

Merge branch 'main' into feature/stable-bench

1d68eaf

cargo fmt && clippy

f87747a

aaj3f marked this pull request as ready for review May 11, 2026 15:32

aaj3f requested review from bplatz and zonotope May 11, 2026 15:35

zonotope approved these changes May 12, 2026

View reviewed changes

aaj3f merged commit d672df6 into main May 12, 2026
14 checks passed

aaj3f deleted the feature/stable-bench branch May 12, 2026 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workspace-Wide Benchmarking Foundation#1228

Workspace-Wide Benchmarking Foundation#1228
aaj3f merged 11 commits into
mainfrom
feature/stable-bench

aaj3f commented May 8, 2026 •

edited

Loading

Uh oh!

zonotope left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aaj3f commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reading order

What landed (5 commits)

b3ac71724 — bench chassis + retrofit 5 existing benches

58d08c931 — import_bulk, transact_commit, query_cold_reload

c0489a890 — reindex_full, reindex_incremental, novelty_replay

aa6bcc3ee — gen::bsbm generator + query_hot_bsbm bench

cb4916968 — bench-gate CI job + workspace reconcile test

Validation

Two-phase gate model — what's done vs deferred

What this PR explicitly does NOT do

Files & line count

Follow-ups

Uh oh!

zonotope left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aaj3f commented May 8, 2026 •

edited

Loading

`b3ac71724` — bench chassis + retrofit 5 existing benches

`58d08c931` — `import_bulk`, `transact_commit`, `query_cold_reload`

`c0489a890` — `reindex_full`, `reindex_incremental`, `novelty_replay`

`aa6bcc3ee` — `gen::bsbm` generator + `query_hot_bsbm` bench

`cb4916968` — bench-gate CI job + workspace reconcile test