Skip to content

Workspace-Wide Benchmarking Foundation#1228

Merged
aaj3f merged 11 commits into
mainfrom
feature/stable-bench
May 12, 2026
Merged

Workspace-Wide Benchmarking Foundation#1228
aaj3f merged 11 commits into
mainfrom
feature/stable-bench

Conversation

@aaj3f
Copy link
Copy Markdown
Contributor

@aaj3f aaj3f commented May 8, 2026

Summary

5 commits on feature/stable-bench standing up the workspace's benchmarking infrastructure end-to-end. This is partially for general bench auditability, but also provides a baseline against which more aggressive refactors or improvements in the future can be run with clear-eyed assessments of whether it affects hot-path performance. Hot-paths such as commit, index build, incremental index, bulk import, novelty replay, cold reload, hot SPARQL query now have a bench, every bench is registered with a regression budget, and ubuntu-latest runs a smoke gate on every PR. Adds ~4,500 LoC across a new fluree-bench-support crate, 7 new bench files, 5 retrofitted ones, 3 contributor docs, and a CI workflow extension.

The companion regression-comparison phase (committed baselines + nightly cron + dashboard) lands separately as bench-nightly.

Reading order

Reader profile Read
Skim-only reviewer (~15 min) This description + BENCHMARKING.md
Per-feature reviewer (~30 min) This + the relevant commit's body and the bench file it lands
Implementor / chassis maintainer (~1 hr) This + docs/contributing/benches.md + fluree-bench-support/README.md + the chassis source
Plan / RFC author The bench-infrastructure plan in .claude/proposed-work/docs/plan-benchmark-infrastructure.md is the design rationale this PR implements

What landed (5 commits)

b3ac71724 — bench chassis + retrofit 5 existing benches

The new internal fluree-bench-support crate:

  • init_tracing_for_bench() — opt-in FLUREE_BENCH_TRACING=1 subscriber. Replaces an 18-line block previously duplicated verbatim across three benches.
  • next_ledger_alias(prefix) — atomic, never-reused alias of the form bench/{prefix}-{n}:main. Replaces three independent LEDGER_COUNTER: AtomicU64 patterns.
  • bench_runtime(), BenchProfile, BenchScale, current_profile(), current_scale() — env-driven knobs (FLUREE_BENCH_PROFILE, FLUREE_BENCH_SCALE, FLUREE_BENCH_RUNTIME).
  • gen::people / gen::vectors / gen::corpora — deterministic data generators lifted from the five existing benches. Tested for byte-stability across runs.
  • templates/BENCH_TEMPLATE.rs — working bench skeleton with TODO markers; the copy-this-and-edit starting point per the contributor guide.
  • budget, fixtures, report — regression-budget loader (check() helper), fixture path resolution, opt-in markdown-style end-of-run summary tables.

Plus three new docs: workspace-root BENCHMARKING.md (orientation), docs/contributing/benches.md (deep contributor guide mirroring tracing-guide.md), fluree-bench-support/README.md (API reference).

The five existing benches (insert_formats, vector_query, fulltext_query, vector_math, spatial_bench) are retrofitted to use the chassis. Mostly deletions: dropped init_tracing_for_bench × 3, LEDGER_COUNTER × 3, inline generators × 4. Behavior on Large scale is byte-identical to pre-retrofit.

58d08c931import_bulk, transact_commit, query_cold_reload

Three new benches in fluree-db-api/benches/:

  • transact_commit.rs — single-commit latency on a fresh and a populated ledger. Distinct from insert_formats.rs, which measures total throughput across many txns; this measures the per-commit latency users see in production. Uses criterion::iter_batched so the base-load setup is excluded from timing.
  • import_bulk.rs — bulk Turtle import via fluree.create(id).import(path).execute(). Two scenarios (single_threaded, default_threads); throughput in triples/sec.
  • query_cold_reload.rs — file-backed cold reload + first query latency. Two scenarios (cold_load, cold_load_plus_query).

c0489a890reindex_full, reindex_incremental, novelty_replay

Three more benches in fluree-db-api/benches/:

  • reindex_full.rsFluree::reindex(...) end-to-end, throughput in triples/sec.
  • reindex_incremental.rs — exercises the orchestrator's incremental path. Setup pre-builds an indexed baseline + delta commits sitting in novelty above it; measured op is Fluree::trigger_index(...).
  • novelty_replay.rs — cold reload with without_indexing() so the populate phase keeps everything in novelty. Scaled by commit count to exercise bulk_apply_commits at varying chain depth.

This commit also folds two new entries into docs/contributing/benches.md:

  • The iter_batched + tokio reactor gotcha caught while writing bench-2: criterion::iter_batched's setup closure runs synchronously, so FlureeBuilder::file(...).build() (which touches the storage backend) panics outside a block_on. Doc shows the canonical fix.
  • Workspace clippy lints apply to bench code — names the two that bit during validation (needless_raw_string_hashes, uninlined_format_args).

aa6bcc3eegen::bsbm generator + query_hot_bsbm bench

A deterministic BSBM-shape data generator (Vendor, Product, Person, Review) added to the chassis, plus a hot-cache SPARQL bench with three scenarios drawn from the BSBM query catalogue:

  • Q3-shape — multi-hop join + scalar range filter.
  • Q5-shape — multi-join with price-range filter and ORDER BY.
  • Q9-shape — group-by + count + HAVING.

Setup builds the dataset once per scale, populates a file-backed ledger, runs a full reindex (so the binary columnar index is in place), and reuses the resulting GraphSnapshot for all b.iter calls. The bench measures warm-cache binary scan, not novelty replay or load.

Deliberate deviation from plan: the plan suggested vendoring a 5 MB BSBM-1K Turtle file. The programmatic generator ships instead — keeps the repo small, scales naturally across BenchScale tiers, no external bsbmtools dependency, deterministic. For multi-million-triple scales we may eventually want bsbmtools-faithful distributions; that's a follow-up if nightly diverges.

cb4916968 — bench-gate CI job + workspace reconcile test

The bench-gate job in .github/workflows/ci.yml. Two checks:

  1. Reconcilecargo test -p fluree-bench-support --test workspace_reconcile walks every workspace member's Cargo.toml for [[bench]] entries and confirms each has a matching entry in regression-budget.json (and vice versa). Fails with a clear error message naming the offending crate/bench pair. Catches: missing budgets, stale budget entries (deleted/renamed benches), unknown crate references.
  2. Smokecargo bench --workspace -- --test runs each bench's scenarios once at tiny scale. Catches benches that compile but panic at runtime (bad SPARQL, broken setup, missing API surface).

Wall-clock budget: ~5–8 min after rust-cache warms; ~12–15 min cold.

This commit also bundles two scope additions caught by the smoke validation:

  • Three pre-existing benches gated by FLUREE_BENCH_SCALEinsert_formats, vector_query, fulltext_query had hardcoded size arrays from before bench-1's retrofit. The fulltext bench's 50k case stalled the smoke run. Each gains a small dataset_sizes() / matrix() helper that slices by scale. Large behavior is byte-identical to pre-retrofit.
  • Real bug caught in vector_query.rs — query VALUES blocks used the "@type": "@vector" alias, which the parser no-longer-accepts in query context (only INSERT). Fixed to use the full IRI "@type": "https://ns.flur.ee/db#embeddingVector", matching it_vector_flatrank.rs. Honest demonstration of the gate's value: the bench had been silently broken until the smoke run forced it.

BENCHMARKING.md gains a §"Regression budgets" rewrite documenting the two-phase gate model: bench-gate (per-PR, smoke + reconcile) and bench-nightly (separate workflow on cron, regression comparison against committed baselines).

Validation

  • Chassis tests: cargo test -p fluree-bench-support --lib — 44 passed, 0 failed.
  • Workspace reconcile: cargo test -p fluree-bench-support --test workspace_reconcile — passes. Verified to fail with clear messages by manually deleting a budget entry then restoring.
  • Full smoke run: FLUREE_BENCH_PROFILE=quick FLUREE_BENCH_SCALE=tiny cargo bench --workspace -- --testall 12 bench files, every scenario, all report Success.
  • Workspace check: cargo check --workspace --benches — clean.

Two-phase gate model — what's done vs deferred

Done in this PR (bench-gate, runs per-PR):

  • Smoke validation — every bench compiles + runs at tiny scale.
  • Reconcile — every [[bench]] is registered in regression-budget.json, no stale entries, no unknown crates.
  • Behavior validation — the gate caught a real pre-existing bug in vector_query.rs during landing.

Deferred to bench-nightly:

  • bench-baselines.json schema + initial baselines committed from a few CI runs.
  • Cron-triggered nightly workflow that runs the Full profile across Medium / Large scales.
  • Regression comparison: observed_ns ≤ baseline_ns × (1 + budget_pct/100) per (crate, bench, scale).
  • GH-Pages dashboard rendered from bench-history/ JSON-Lines.

The reason for the split: ubuntu-latest shared runners flap enough that per-PR regression comparison would produce false positives. Nightly amortizes noise across the Full profile's larger sample counts and uses dedicated 4-core runners.

What this PR explicitly does NOT do

Per the bench-infrastructure plan §10 (out of scope) and one or two judgment calls during implementation:

  • Vendored BSBM Turtle fixture. Programmatic generator ships instead. Vendoring revisited if nightly large-scale runs need bsbmtools-faithful distributions.
  • iai-callgrind for noise-free PR gating. Plan §11 open question; defer until we have flap data from criterion-on-ubuntu-latest.
  • Auto-generated bench tables in BENCHMARKING.md — manual table is fine; auto-generation is mechanical follow-up if it becomes painful.
  • Real-world dataset benches (e.g., dblp). Programmatic gen::people and gen::bsbm cover the shapes the upcoming RFCs need; real-world data is a useful complement, not a blocker.
  • Regression-comparison enforcement — see "Deferred" above.

Files & line count

  • 38 files changed, 4473 insertions, 413 deletions.
  • New crate: fluree-bench-support (~1500 LoC, 44 unit tests + 1 integration test).
  • New benches: 7 (import_bulk, transact_commit, query_cold_reload, reindex_full, reindex_incremental, novelty_replay, query_hot_bsbm).
  • Retrofitted benches: 5 (insert_formats, vector_query, fulltext_query, vector_math, spatial_bench).
  • New docs: BENCHMARKING.md (~150 lines), docs/contributing/benches.md (~440 lines after gotcha additions), fluree-bench-support/README.md (~110 lines).
  • New CI artifact: bench-gate job (~50 lines of YAML).
  • New tracker: regression-budget.json (workspace root).

Follow-ups

  • bench-nightly — separate PR. Cron workflow + baselines + regression-comparison enforcement + dashboard.
  • Coverage gaps from PR review — 8 perf-regression-prone hot paths flagged during review are tracked under [bench] Post-PR-#1228 coverage gaps (parent tracker) #1235 (parent tracker referencing 6 sub-issues grouped by which can land together as one PR). Highest-priority is the property-join planner regression bench ([bench] Property-join planner regression bench (analyze_property_join_plan boundaries) #1229; ~176×–933× recent wins); next is the lex-sorted-string ORDER BY fast-path bench ([bench] Lex-sorted-string ORDER BY fast-path regression bench #1230; 200× speedup that depends on a silent-regression-prone invariant). None blocks this PR.
  • Operator-side simplifications unlocked by the four upcoming RFCs (NamespaceUniverse, OverlaySubstrate, LedgerStateAdvance) — tracked in the post-streamline cleanup doc.
  • vector_math.rs post-streamline-query move — its import of fluree_db_query::expression::vector_math will need re-pointing to eval::vector_math once the colleague's refactor/streamline-query stack merges. Trivial one-line edit; not blocking.
  • Real-world dataset generator (e.g., dblp) — clean follow-up if a future bench needs distributions the synthetic generators miss.

aaj3f added 7 commits May 8, 2026 09:29
Lands the foundation of the workspace's new bench infrastructure: a shared
chassis crate, three contributor docs, an initial regression-budget
schema, and a clean retrofit of all five pre-existing benches onto the
new helpers. No behavior change to any bench; existing scenarios produce
byte-identical output (chassis tests pin the determinism contract on
every lifted generator).

This is the first commit on `feature/stable-bench`. Subsequent commits on
this branch will add new benches against currently-uncovered hot paths
(bench-2: `import_bulk`, `transact_commit`, `query_cold_reload`; bench-3:
reindex/incremental/novelty-replay; bench-4: BSBM-shape query +
vendored fixture; bench-5: gated CI job + initial baselines). The
nightly workflow + remote fixture host land separately as
`bench-nightly`.

## What's new

- `fluree-bench-support/` — new internal crate, `dev-dependency`-only.
  Eleven modules (~1,500 LoC), 37 unit tests. Public surface:
  - `init_tracing_for_bench()` — opt-in stderr subscriber under
    `FLUREE_BENCH_TRACING=1`. Replaces an 18-line block previously
    duplicated verbatim in three benches.
  - `next_ledger_alias(prefix)` — atomic, never-reused alias of the form
    `bench/{prefix}-{n}:main`. Replaces three independent
    `LEDGER_COUNTER: AtomicU64` patterns.
  - `bench_runtime()`, `BenchProfile`, `BenchScale`,
    `current_profile()`, `current_scale()` — env-driven knobs
    (`FLUREE_BENCH_PROFILE`, `FLUREE_BENCH_SCALE`,
    `FLUREE_BENCH_RUNTIME`).
  - `gen::people` — Person/Company graph generator + JSON-LD/Turtle
    serializers (lifted from `insert_formats.rs`).
  - `gen::vectors` — `f64` vector generators, both deterministic-from-
    seed and RNG-driven (lifted from `vector_math.rs` and
    `vector_query.rs`).
  - `gen::corpora` — paragraph templates + `random_paragraph()` (lifted
    from `fulltext_query.rs`).
  - `budget` — `RegressionBudget` schema + loader for
    `regression-budget.json` + `check()` helper. Reconciler stub for
    bench-5.
  - `fixtures` — workspace-root `fluree-bench-support/fixtures/`
    resolution. `load_or_generate()` body lands in bench-4.
  - `report` — opt-in markdown-style end-of-run summary tables.
  - `tracing::BenchSpanLayer` — skeleton for span-capture-to-file; full
    impl in a follow-up.
  - `templates/BENCH_TEMPLATE.rs` — working bench skeleton with TODO
    markers; copy-this-and-edit starting point per the contributor guide.

- `BENCHMARKING.md` (workspace root) — orientation: what benches exist,
  how to run, env vars, output format, regression budgets. README points
  at it.

- `docs/contributing/benches.md` — deep contributor guide mirroring
  `tracing-guide.md`'s structure. Six-step workflow for adding a bench,
  category conventions, common patterns, gotchas, debugging a flaky
  bench, span-capture instructions, review checklist.

- `regression-budget.json` (workspace root) — per-bench, per-scale
  percentage regression CI accepts. Schema-defining initial values; real
  baselines land with the first nightly run.

## Retrofits

Every existing bench compiles green and runs `--test` green after the
substitution. The chassis tests pin byte-identical output for every
lifted generator.

- `fluree-db-api/benches/insert_formats.rs` — drops the 18-line
  `init_tracing_for_bench`, the `LEDGER_COUNTER: AtomicU64` static, the
  `PersonData/CompanyData/TxnData` types, and three generator
  functions (`generate_txn_data`, `txn_data_to_jsonld`,
  `txn_data_to_turtle`) in favor of chassis imports. Four
  `LEDGER_COUNTER.fetch_add(...)` call sites become
  `next_ledger_alias("...")`.

- `fluree-db-api/benches/vector_query.rs` — drops the duplicated
  `init_tracing_for_bench` and the local `random_vector(rng, dim)`;
  imports `gen::vectors::rng_one as random_vector` (same signature,
  byte-identical output).

- `fluree-db-api/benches/fulltext_query.rs` — drops the duplicated
  `init_tracing_for_bench`, the inline `PARAGRAPH_TEMPLATES` and
  `EXTRA_VOCAB` constants (~120 lines), and the `random_paragraph`
  function. Imports `gen::corpora::random_paragraph` (same signature).

- `fluree-db-query/benches/vector_math.rs` — drops the local
  `random_vectors(dim)`; imports `gen::vectors::hashed_pair as
random_vectors`. No tracing/ledger surface to retrofit (pure math).

- `fluree-db-spatial/benches/spatial_bench.rs` — chassis dev-dep wired
  via `use fluree_bench_support as _;` so future spatial benches can
  opt in. Spatial-domain geometry generators stay co-located (not yet
  reused elsewhere).

## Wiring

- Workspace `Cargo.toml` adds `fluree-bench-support` as a member.
- `fluree-db-api`, `fluree-db-query`, `fluree-db-spatial` each gain
  `fluree-bench-support` in `[dev-dependencies]`.
- Workspace `README.md` adds one line under "Documentation" pointing
  at `BENCHMARKING.md`.

## Verification

- `cargo test -p fluree-bench-support --lib` — 37 passed
- `cargo check --workspace --benches` — clean
- `cargo clippy -p fluree-bench-support --lib --tests` — clean
- `cargo bench -p fluree-db-query --bench vector_math -- --test` —
  21 scenarios all `Success`
- Each retrofitted bench compiles in `--release` (`cargo bench --no-run`)

## Not yet done (deferred to follow-up commits / PRs)

- `validate_against_workspace()` — full Cargo-toml ↔ budget-JSON
  reconciler. Stub today; lands with bench-5 (CI gate).
- `BenchSpanLayer` — file-mode tracing
  (`FLUREE_BENCH_TRACING=file:./out.json`). Today this falls back to
  stderr with a `tracing::warn!`.
- `fixtures::load_or_generate` — body. Today returns a `FixtureRef`
  placeholder. Vendored data lands in bench-4; remote fetch in
  bench-nightly.
- BSBM-shape and other new benches (bench-2 through bench-4 commits on
  this branch).
- `.github/workflows/ci.yml` gated bench job (bench-5 commit).
- Nightly workflow + dashboard (`bench-nightly` PR, separate).
- `vector_math.rs` import path will need re-pointing from
  `fluree_db_query::expression::vector_math` to `eval::vector_math`
  once `refactor/streamline-query`'s stacked work merges. Trivial
  one-line follow-up.

## Plan reference

Designed per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md`
§4.5 (Extensibility and contributor onboarding) and §8 (Migration
plan). The chassis design absorbed the survey findings in §2.1
("Concrete patterns observed").
Three new benches in `fluree-db-api/benches/`, each exercising a hot path
the existing five benches don't cover. All build on the
`fluree-bench-support` chassis introduced in the previous commit, which
keeps each bench file under ~200 lines of bench-specific code.

This is the second commit on `feature/stable-bench`, landing the bench-2
slice of the bench-foundation PR. Subsequent commits add reindex /
incremental-index / novelty-replay benches (bench-3), the BSBM-shape
hot-query bench plus vendored fixture (bench-4), and the gated CI job
plus initial baselines (bench-5).

## What's new

### `fluree-db-api/benches/transact_commit.rs` (196 lines)

Single-commit latency on a fresh and a populated ledger. Distinct from
`insert_formats.rs`, which measures total throughput across many txns;
`transact_commit` measures the per-commit latency that most users see in
production.

- Scenario 1: `fresh_ledger` — commit one small txn against a
  freshly-created ledger. Measures pure commit overhead.
- Scenario 2: `populated_ledger` — commit one small txn against a
  ledger pre-loaded with `base_nodes` of history. Skipped when
  `base_nodes == 0`.
- Scale-driven inputs: Tiny=100×10, Small=1k×10, Medium=10k×10,
  Large=100k×10 (base nodes × commit nodes).
- Uses `criterion::iter_batched` so the base-load setup is excluded
  from the timed measurement.
- Memory-backed Fluree (`FlureeBuilder::memory().build_memory()`) —
  no I/O confound.

### `fluree-db-api/benches/import_bulk.rs` (193 lines)

Bulk Turtle import via `fluree.create(id).import(path).execute()`. The
hot path under the hood: Turtle streaming parse → chunked staging →
root assembly → FIR6 root publish.

- Scenario 1: `single_threaded` — `threads(1)`. Baseline that doesn't
  confound with parallelism overhead.
- Scenario 2: `default_threads` — exercises parallel-import allocator
  and worker-cache. Skipped at Tiny scale where parallel overhead would
  dominate.
- Scale: Tiny=1k → Large=200k nodes, ~4-7× as triples.
- Throughput annotation in `Throughput::Elements(triples)` so
  criterion's `thrpt` line reads in triples/sec.
- File-backed Fluree (`FlureeBuilder::file(...)`) with fresh tempdirs
  per iteration; `iter_batched` setup wraps the builder construction in
  the bench's tokio runtime (`rt.block_on(...)`) because the file-backed
  builder requires a running reactor.

### `fluree-db-api/benches/query_cold_reload.rs` (191 lines)

Cold ledger reload latency. Pre-populate a file-backed ledger, drop the
Fluree connection, then time the rebuild when a fresh handle opens the
same ledger.

- Scenario 1: `cold_load` — just `fluree.graph(id).load()`. Measures
  the load path: storage read → snapshot decode → novelty replay →
  binary-store attach.
- Scenario 2: `cold_load_plus_query` — load + one SPARQL query.
  Captures the full "I restarted my application, time to first answer"
  user-visible latency.
- Scale: Tiny=200 → Large=50k base nodes. Single populating txn so
  scaling is purely "amount of data," not "depth of commit chain."
- Uses `iter_batched` to put populate-and-drop in setup and measure
  only the cold open.

## Wiring

- `fluree-db-api/Cargo.toml` — three new `[[bench]]` entries.
- `regression-budget.json` — three new `fluree-db-api.*` entries with
  placeholder budgets (`tiny: 10%, small: 5%, medium: 5%`); real
  baselines land with the first nightly run (bench-nightly PR).

No new dev-deps; `tempfile` and the chassis are already on
`fluree-db-api`.

## Verification

- `cargo check -p fluree-db-api --benches` — clean
- `cargo bench --no-run` for all three — clean release build
- `cargo bench -p fluree-db-api --bench transact_commit -- --test` —
  both scenarios `Success` at small + tiny
- `cargo bench -p fluree-db-api --bench import_bulk -- --test` —
  both scenarios `Success` at small; `single_threaded` only at tiny
- `cargo bench -p fluree-db-api --bench query_cold_reload -- --test` —
  both scenarios `Success` at small + tiny
- `cargo test -p fluree-bench-support --lib` — 37 passed (chassis
  contracts unbroken)

## Gotcha worth flagging for future bench authors

`FlureeBuilder::file(...).build()` — and any other API that touches the
file-backed storage path — requires a running tokio reactor. With
`criterion::iter_batched`, the `setup` closure runs synchronously
outside any `block_on`, so a setup that calls `FlureeBuilder::file(...)`
panics with "there is no reactor running, must be called from the
context of a Tokio 1.x runtime."

Fix: wrap setup work that touches the file backend in
`rt.block_on(async { ... })`. The memory-backed builder
(`FlureeBuilder::memory().build_memory()`) doesn't have this constraint
and works fine in synchronous setup.

This will be added to `docs/contributing/benches.md` as a "Gotchas"
entry in a follow-up commit on this branch.

## Plan reference

Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2
(`bench-2`).
Three new benches in `fluree-db-api/benches/`, each exercising a hot
path that bench-1 and bench-2 didn't cover: full reindex from the
commit chain, incremental indexing via the orchestrator, and cold
reload through deep novelty. The contributor guide grows two new
"Gotchas" entries — the tokio-reactor-in-`iter_batched`-setup pitfall
caught during bench-2, and a note about which workspace clippy lints
catch bench-side mistakes.

This is the third commit on `feature/stable-bench`, landing the bench-3
slice of the bench-foundation PR. Subsequent commits add the BSBM-shape
hot-query bench plus vendored fixture (bench-4) and the gated CI job
plus initial baselines (bench-5).

## What's new

### `fluree-db-api/benches/reindex_full.rs` (145 lines)

End-to-end `fluree.reindex(id, ReindexOptions::default())` against a
file-backed ledger pre-populated in a single txn. The hot path under
the hood: commit-chain replay → flake collection → binary columnar
index build (FLI3 leaves, FBR3 branches) → FIR6 root publish.

- Scenario: `single_txn` — base data committed in one txn so the
  measurement scales with "amount of data," not "depth of commit chain."
- Scale: Tiny=200 → Large=50k base nodes; throughput in triples/sec.
- File-backed Fluree per iteration; `IndexConfig` with high reindex
  thresholds during populate so background indexing doesn't race the
  measured op.

### `fluree-db-api/benches/reindex_incremental.rs` (175 lines)

Orchestrator-driven incremental indexing. Setup brings the ledger to a
state where an index already covers `base_nodes` of data and
`delta_commits` worth of additional commits sit in novelty above it;
the measured op is the `Fluree::trigger_index(...)` call that drives
the orchestrator to extend the index over those novelty commits.

- Scenario: `apply_delta` — exercises the orchestrator's incremental
  path against novelty above an indexed base.
- Scale: (Tiny=200×5, Small=2k×20, Medium=10k×50, Large=50k×200) for
  (base_nodes × delta_commits). Each delta commit is 10 nodes, so
  novelty depth scales with `delta_commits`.
- Setup uses `Fluree::reindex` to establish the baseline index, then
  `Fluree::ledger(id)` to reload state with the new index head before
  applying the delta commits.

### `fluree-db-api/benches/novelty_replay.rs` (145 lines)

Cold reload latency under deep novelty with **no index attached**.
Distinct from `query_cold_reload.rs`, which scales by amount of data
committed in one txn; this scales by **commit-chain depth**. The
measured cold reload exercises
`fluree-db-novelty::Novelty::bulk_apply_commits` (memory:
`mem:fact-01kqfy6txdrjppaf6756xzdz25`) plus per-commit envelope-delta
application.

- Setup uses `FlureeBuilder::file(...).without_indexing().build()` so
  every populate-phase commit stays in novelty.
- Scale: Tiny=20 → Large=2000 commits, 10 nodes each.
- Scenario: `replay_chain` — single cold reload that replays the full
  commit chain into the in-memory novelty store.

## Doc update folded in

`docs/contributing/benches.md` grows two Gotchas entries:

1. **`iter_batched` setup needs a tokio reactor for file-backed Fluree.**
   Caught while validating bench-2: `criterion::iter_batched`'s setup
   closure runs synchronously, so `FlureeBuilder::file(...).build()` —
   which touches the storage backend during construction — panics with
   "there is no reactor running, must be called from the context of a
   Tokio 1.x runtime." The doc shows the canonical fix (wrap setup in
   `rt.block_on(async { ... })`) and notes that
   `FlureeBuilder::memory().build_memory()` doesn't have this constraint.
   All three new benches in this commit follow the wrapped-setup
   pattern.

2. **Workspace clippy lints apply to bench code.** The workspace
   `Cargo.toml`'s `[workspace.lints.clippy]` denies several lints; two
   that bit during bench-2 are called out by name —
   `needless_raw_string_hashes` (write `r"..."` not `r#"..."#`) and
   `uninlined_format_args` (write `format!("{x}")` not
   `format!("{}", x)`). Plus a one-liner about running
   `cargo clippy --benches` locally before pushing.

## Wiring

- `fluree-db-api/Cargo.toml` — three new `[[bench]]` entries.
- `regression-budget.json` — three new `fluree-db-api.*` entries with
  placeholder budgets (`tiny: 10%, small: 5%, medium: 5%`); real
  baselines land with the first nightly run (bench-nightly PR).

No new dev-deps; the chassis and `tempfile` are already on
`fluree-db-api`.

## Verification

- `cargo check -p fluree-db-api --benches` — clean
- `cargo bench --no-run` for all three — clean release build
- `cargo bench -p fluree-db-api --bench reindex_full -- --test` —
  scenario `Success`
- `cargo bench -p fluree-db-api --bench novelty_replay -- --test` —
  scenario `Success`
- `cargo bench -p fluree-db-api --bench reindex_incremental -- --test` —
  scenario `Success`
- `cargo test -p fluree-bench-support --lib` — 37 passed (chassis
  contracts unbroken)
- `cargo check --workspace --benches` — clean

## Plan reference

Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2
(`bench-3`). The plan suggested putting reindex benches in
`fluree-db-indexer/benches/` and novelty_replay in
`fluree-db-novelty/benches/`; in practice the user-facing entry points
(`Fluree::reindex`, `Fluree::trigger_index`, `fluree.graph(id).load()`)
all live in `fluree-db-api`, so the benches naturally belong there. A
future commit can add focused micro-benches in the indexer/novelty
crates if the end-to-end versions need a finer breakdown.
A new bench in `fluree-db-api/benches/` exercising warm-cache SPARQL
query latency on a BSBM-shape graph, plus a deterministic BSBM-shape
data generator added to the chassis. Three query scenarios cover three
distinct planner / scan patterns (multi-hop filter, multi-join with
range filter + ORDER BY, group-by + count + HAVING). The dataset is
generated on the fly rather than vendored as a 5 MB Turtle file.

This is the fourth commit on `feature/stable-bench`, landing the
bench-4 slice of the bench-foundation PR. The remaining commit
(`bench-5`) adds the gated CI job plus the initial regression
baselines.

## What's new

### `fluree-bench-support/src/gen/bsbm.rs` (300 lines)

Deterministic generator for a four-entity graph drawn from the
[Berlin SPARQL Benchmark](http://wbsg.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/):
`Vendor`, `Product`, `Person`, `Review`.

- Counts derive from `n_products`: `n_vendors = n/50`,
  `n_persons = n/10`, `n_reviews = n*3`.
- Five product types and five countries cycled deterministically.
- Prices in `1_000..=51_000` cents (range chosen so range filters see
  meaningful selectivity).
- Ratings cycle `1..=5` across reviews so HAVING / ORDER BY scenarios
  have genuine variation.
- `bsbm_data_to_turtle(&data)` renders the graph as a single
  well-formed Turtle document with `ex:`, `bsbm:`, and `xsd:` prefixes.
- 7 unit tests pin the determinism contract (chassis suite grows from
  37 to 44 passing tests).

### `fluree-db-api/benches/query_hot_bsbm.rs` (230 lines)

Three SPARQL scenarios against a populated, indexed file-backed Fluree:

- **Q3-shape** — multi-hop join + scalar range filter
  (Electronics products with reviews rated ≥ 4).
- **Q5-shape** — multi-join with price-range filter and `ORDER BY`
  (top-10 products in 5000–25000 cents, ordered by price, with
  vendor label).
- **Q9-shape** — `GROUP BY` + `COUNT` + `HAVING`
  (products with ≥ 3 reviews, ordered by review count).

Setup builds the dataset once per scale, populates the ledger, runs a
full reindex (so the binary columnar index is in place), and loads a
`GraphSnapshot`. All three scenarios reuse that snapshot — the bench
measures **warm-cache binary scan**, not novelty replay or load
overhead.

Scale: `Tiny=100 → Large=100k` products. Other entity counts derive.

## Why programmatic (not vendored)

The bench plan suggested vendoring the canonical 5 MB BSBM-1K Turtle
file under `fluree-bench-support/fixtures/`. This commit takes the
programmatic path instead:

- A binary-ish 5 MB blob in git inflates clone / fetch footprint for
  every contributor and every CI build, indefinitely.
- Programmatic generation is deterministic (chassis tests pin the
  contract) and scale-parameterized (one knob, four `BenchScale`
  tiers).
- Avoids a build-time dependency on
  [`bsbmtools`](https://github.com/wbsg-uni-mannheim/bsbmtools) or any
  external generator script.

For multi-million-triple scales (Large+ in nightly), we may still
want the canonical generator's distributions — that's a follow-up if
a nightly run discovers the synthetic shape diverges enough from
real-world BSBM behavior to matter.

The plan's `fluree-bench-support/scripts/gen-bsbm.sh` external
generator is therefore not added in this commit. The bench plan doc
should be updated to reflect the programmatic-first approach in a
follow-up cleanup commit.

## Wiring

- `fluree-bench-support/src/gen/mod.rs` — `pub mod bsbm;` plus a
  one-paragraph entry in the module docstring.
- `fluree-db-api/Cargo.toml` — new `[[bench]] name = "query_hot_bsbm"`
  entry.
- `regression-budget.json` — new `query_hot_bsbm` entry with
  placeholder budgets (`tiny: 10%, small: 5%, medium: 3%`). The medium
  budget is tighter because hot-path query latency should be the most
  stable signal in the bench suite.

No new dev-deps; the chassis is already on `fluree-db-api`.

## Verification

- `cargo test -p fluree-bench-support --lib` — **44 passed** (up from
  37; the 7 new tests cover bsbm determinism, count ratios, type /
  country / rating distribution, and Turtle prefix presence).
- `cargo check -p fluree-db-api --bench query_hot_bsbm` — clean.
- `cargo bench --no-run` — clean release build.
- `FLUREE_BENCH_SCALE=tiny cargo bench -p fluree-db-api --bench query_hot_bsbm -- --test`
  — `q3/q5/q9` all report `Success`.
- `cargo check --workspace --benches` — clean.

## Plan reference

Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2
(`bench-4`). Two deliberate deviations from the plan worth flagging
for review:

1. **Programmatic generator over vendored fixture** (see "Why
   programmatic" above). The plan's intent — let benches scale across
   `BenchScale` tiers without external tooling — is preserved; the
   delivery vehicle is different.
2. **No `gen-bsbm.sh` external script** at this stage. If we later
   need bsbmtools-faithful distributions for nightly large-scale
   runs, that's a separate concern.

A future `bench-N` commit (or a follow-up after this branch lands)
could add a parallel generator wired to the in-tree dblp data the
CLI's bulk-import work already uses, since real-world graph
distributions exercise shapes the synthetic generator may miss
(degree skew, prefix variety, etc.). Tracked as an out-of-scope
follow-up, not blocked by this PR.
A new `bench-gate` job in `.github/workflows/ci.yml` runs on every PR
and push to `main`. Two checks:

1. **Reconcile** — `cargo test -p fluree-bench-support --test
workspace_reconcile` walks every workspace member's `Cargo.toml`
   for `[[bench]]` entries and confirms each has a matching entry in
   `regression-budget.json` (and vice versa). Fails the gate with a
   clear error message naming the offending `crate/bench` pair.
2. **Smoke** — `cargo bench --workspace -- --test` runs each bench's
   scenarios once at `tiny` scale (`FLUREE_BENCH_PROFILE=quick`,
   `FLUREE_BENCH_SCALE=tiny`). Catches benches that compile but panic
   at runtime (bad SPARQL, broken setup, missing API surface).

This is the fifth and final commit on `feature/stable-bench`,
completing the bench-foundation PR. The full regression-comparison
phase (which compares observed nanoseconds against committed
baselines and a per-bench budget percentage) lands separately in
`bench-nightly` because (a) it needs runner-stable baselines that
only emerge from a few nightly runs and (b) per-PR comparison on
shared `ubuntu-latest` runners would flap.

## What's new

### `.github/workflows/ci.yml` — `bench-gate` job

- Same setup pattern as the existing `clippy` / `test` jobs:
  free disk space → checkout → toolchain → rust-cache.
- Three steps: reconcile, build, smoke. Reconcile runs first because
  it's fast (~1s) and tells the contributor exactly what's wrong
  before they wait on a release build.
- Wall-clock budget: ~5–8 min after rust-cache warms; ~12–15 min
  cold. Within plan §6.1's CI minute budget.

### `fluree-bench-support/tests/workspace_reconcile.rs` (~145 lines)

- Walks workspace members, parses each crate's `Cargo.toml` for
  `[[bench]]` entries, builds a `crate -> [bench]` map.
- Loads `regression-budget.json` and reconciles bidirectionally:
  - **Missing budget** — declared bench without a budget entry.
    Fails with "add an entry under `crates.<crate>.<bench>` in
    `regression-budget.json`."
  - **Stale budget** — budget entry without a matching `[[bench]]`.
    Fails with "remove the entry or rename the bench file."
  - **Unknown crate** — budget references a crate not in the
    workspace.
- Uses `toml = "0.8"` as a dev-dep (test-only; doesn't pull `toml`
  into anyone who depends on the chassis library).

### `BENCHMARKING.md` — §"Regression budgets" rewritten

- Documents the two-phase gate model (`bench-gate` per-PR,
  `bench-nightly` separate workflow on cron) explicitly.
- Explains why two phases — `ubuntu-latest` flap makes per-PR
  regression comparison unreliable; the nightly amortizes noise
  across `Full`-profile sample counts and uses 4-core runners.

## Pre-existing benches retrofitted in this commit

Validating the smoke run end-to-end uncovered two issues in
pre-existing benches that needed fixing for the gate to actually
land green. Both are scoped to this commit because the gate is
useless if the smoke fails on existing scenarios.

### Three benches now respect `FLUREE_BENCH_SCALE`

`insert_formats.rs`, `vector_query.rs`, and `fulltext_query.rs` had
hardcoded size arrays from before bench-1's chassis retrofit:

- `insert_formats.rs` — `TXN_COUNTS = &[10, 100]`,
  `NODES_PER_TXN = &[10, 100, 1000]`. Six matrix cells.
- `vector_query.rs` — `DATASET_SIZES = &[1_000, 5_000]`.
- `fulltext_query.rs` — `DATASET_SIZES = &[1_000, 5_000, 10_000, 50_000]`.

The fulltext bench's 50k case stalled the smoke run. Each file gains
a small scale-driven slice helper (`dataset_sizes()` / `matrix()`)
that returns the appropriate subset for `current_scale()`. At
`Tiny`: smallest size only. At `Small`: 1–2 sizes. At `Medium`:
2–3 sizes. At `Large`: full curve.

Behavior at the `Large` scale is byte-identical to pre-retrofit;
the helpers just slice when smaller.

### Real bug caught: `vector_query.rs` SPARQL VALUES type alias

Smoke validation panicked with:

```
Array @value is only supported for
https://ns.flur.ee/db#embeddingVector typed literals
```

…because the bench was constructing query VALUES blocks with
`"@type": "@vector"`. The `@vector` alias is INSERT-only; the query
parser requires the full IRI in VALUES context (canonical pattern
in `it_vector_flatrank.rs`). Two query literals updated to use
`"@type": "https://ns.flur.ee/db#embeddingVector"`.

This bench evidently hadn't run cleanly in some time; the gate is
the reason it's running cleanly now.

## Verification

- `cargo test -p fluree-bench-support --test workspace_reconcile` —
  passes (1 test).
- Verified the reconcile test fails correctly when a budget entry
  is removed (manual: deleted `query_hot_bsbm` entry, ran test, got
  "Missing budget entries: fluree-db-api/query_hot_bsbm" with the
  fix-it suggestion; restored entry, test passes again).
- `FLUREE_BENCH_PROFILE=quick FLUREE_BENCH_SCALE=tiny cargo bench
--workspace -- --test` — **all 12 bench files run; every scenario
  reports `Success`**:
  - `insert_formats`, `vector_query`, `fulltext_query`,
    `vector_math`, `spatial_bench` (existing, retrofitted in
    bench-1).
  - `import_bulk`, `transact_commit`, `query_cold_reload` (bench-2).
  - `reindex_full`, `reindex_incremental`, `novelty_replay`
    (bench-3).
  - `query_hot_bsbm` (bench-4).
- `cargo check --workspace --benches` — clean.

## Out of scope (lands separately)

- **Regression-comparison phase** — `bench-nightly` PR. Adds a
  `bench-baselines.json` schema, a cron-triggered nightly workflow
  that runs the `Full` profile across `Medium`/`Large` scales, and
  the budget-check logic. Initial baselines come from the first 2–3
  nightly runs; until those exist, regression-comparison enforcement
  cannot be turned on.
- **iai-callgrind for noise-free PR gating** — open question §11.3
  in the bench-infrastructure plan. Defer until we have flap data
  from criterion-on-`ubuntu-latest`.
- **Auto-generated "Current benches" table** in `BENCHMARKING.md` —
  hand-maintained for now; auto-generation from workspace
  `[[bench]]` declarations is mechanical but not yet worth the
  build-time cost.

## Plan reference

Per `.claude/proposed-work/docs/plan-benchmark-infrastructure.md` §8.2
(`bench-5`). The plan also called for capturing initial baselines as
part of this commit; that's been deferred to `bench-nightly` because
baselines from a developer's machine don't translate to the CI
runner anyway. The bench-gate phase implemented here gives the
workspace its smoke-and-reconcile coverage immediately;
regression-comparison is a separate concern.
aaj3f added 4 commits May 11, 2026 09:17
…fit chassis consistency, sweep doc drift

Single commit addressing all 17 inline review findings on PR #1228 plus deletion of the misleading `validate_against_workspace()` stub. No behavior change to the bench-gate CI job or any landed bench's intent — every fix is either a correctness restoration, a chassis-coherence sweep across pre-existing benches, or doc-drift cleanup. All 44 chassis unit tests + 1 reconcile integration test still pass; full smoke at `tiny` scale shows all 24 scenarios `Success`.

Coverage gaps the reviewer flagged in a separate "follow-up" section are tracked as GH issues #1229#1234 with parent #1235; this commit does not add new benches.

## Major (M1–M5)

- **M1 — `gen::vectors::hashed_pair` determinism contract restored.** The pre-retrofit `vector_math.rs::random_vectors` used a _single shared_ `DefaultHasher` whose state accumulated across the loop; my chassis version constructed a fresh hasher per call, which produced different bytes for the same `dim`. The bench-1 commit's "byte-identical" claim was technically false for this generator. Rewrote `hashed_pair` to mirror the pre-retrofit shared-hasher behavior verbatim. Output is now byte-identical to the pre-chassis bench, restoring the determinism contract documented in the chassis docstring.
- **M2 — `BENCHMARKING.md` "Current benches" table updated.** Was stale at 5 rows from before bench-2/3/4. Now includes all 12 bench files (5 retrofitted + 7 new) so contributors landing on the workspace doc from `README.md` see what exists.
- **M3 — `fluree-bench-support/README.md` four edits.** `gen::bsbm` added to the table. Stale references to "lands in bench-4" / "lands in bench-5" replaced with accurate "stub today" / "implemented at the test level" notes. Test count corrected from 37 to 44.
- **M4 — `docs/contributing/benches.md` "Future work" section rewrite.** Was claiming bench-4 / bench-5 work as upcoming; both landed in PR #1228. Replaced with the reviewer's accurate suggested wording.
- **M5 — `docs/contributing/benches.md` "Current categories" table fixed.** Was pointing at directories that don't exist (`fluree-db-indexer/benches/`, `fluree-db-novelty/benches/`, `fluree-db-core/benches/`). Updated to point at the actual files (all under `fluree-db-api/benches/`) per the PR's deviation-from-plan, with a note explaining where future micro-benches in those crates would live and a "Reserved categories" sub-table for `core` and `query` (not yet realized).

## Moderate (m1–m6)

- **m1 — Retrofitted-bench chassis consistency.** The 5 pre-existing benches were using `tokio::runtime::Runtime::new()` directly and hardcoding `group.sample_size(10)`. Swapped to `bench_runtime()` (no-op behavior change; both are single-threaded current-thread by default) and `current_profile().sample_size()` (Quick still resolves to 10; `Full` profile now correctly yields a wider distribution). The `vector_math.rs` and `spatial_bench.rs` benches don't use Runtime or sample_size knobs and were untouched.
- **m2 — Dropped unused `sha2` dev-dep** from `fluree-bench-support/Cargo.toml`. The "Lockfile-driven hashing for deterministic fixtures" comment was aspirational; no code uses it today. Will be re-added in the same commit that introduces fixture hashing.
- **m3 — Hardcoded `LEDGER_ID` consts replaced** with `next_ledger_alias` calls in 6 benches (`query_cold_reload`, `novelty_replay`, `reindex_full`, `reindex_incremental`, `query_hot_bsbm`, `import_bulk`). Practically safe before because each iteration rebuilds Fluree over a fresh tempdir — but the contributor doc explicitly tells future authors _not_ to do this, so the new benches becoming the canonical example was a coherence problem. Now consistent with `transact_commit` (which already used the chassis pattern). Setup closures now thread the alias through `iter_batched`'s tuple input to the measured op.
- **m4 — `BenchProfile::Full::sample_size()` bumped from 30 to 100** (criterion's default). 30 was below criterion's default, which defeats the "wider distribution for nightly stability" goal. Comment notes that the value is a starting point; we may need to bump it to 200+ once `bench-nightly` lands and we have flap data. `Quick` stays at 10.
- **m5 — `import_bulk.rs` `default_threads` scenario docstring.** Added a note explaining that the chassis's `bench_runtime()` is single-threaded by default, so `FLUREE_BENCH_RUNTIME=multi` is required to get a meaningful parallel-import measurement. Otherwise the comparison to `single_threaded` should be read as "internal worker pool only," not full end-to-end parallel throughput.
- **m6 — `transact_commit.rs` Fluree construction moved into `iter_batched` setup.** Previously one in-memory Fluree was constructed before the bench groups and shared across all iterations of both scenarios; over a `Full`-profile run that meant ~60+ accumulated ledgers in one in-memory state, which could skew later samples through allocator behavior. Setup is excluded from timing, so the cost of moving the construction is fine.

## Minor (µ1–µ6)

- **µ1 — `bsbm.rs` price-range comment fixed.** Was "[10.00, 510.00] cents (i.e., $0.10 to $5.10)"; actual range is 1000–50999 cents (i.e., $10.00 to $509.99). The math was wrong, not the algorithm.
- **µ2 — Q5 in `query_hot_bsbm.rs` unused `xsd:` prefix dropped.** Cosmetic.
- **µ3 — `tracing.rs` `TODO(bench-3)` comment updated.** Bench-3 landed; reference now points at "the bench-nightly follow-up work."
- **µ4 — `BenchSpanLayer` doc comment updated.** Was claiming "lands in a later commit (see plan §5.2 item 1)"; now accurately notes "wiring up the JSON-file emit path is tracked under the `bench-nightly` follow-up."
- **µ5 — `BENCH_TEMPLATE.rs` `let _ = profile;` dead line removed.** `profile` is in fact used at `group.sample_size(profile.sample_size())`; the `let _ = profile;` was leftover from an earlier draft.
- **µ6 — Bench-gate CI job gains `timeout-minutes: 25`.** Caps the entire job so a hung bench fails fast instead of tying up the runner queue. 25 minutes is well above the expected 5–8 min wall-clock budget but well below GitHub Actions' default per-step timeout.

## Stub deletion

`validate_against_workspace()` removed from `fluree-bench-support/src/budget.rs`. The function was a stub whose docstring promised "lands in bench-5" — but bench-5 landed the reconcile as an integration test (`tests/workspace_reconcile.rs`) rather than as a library function. The stub was misleading: a caller would expect it to actually validate; instead it returned `Ok(())` unconditionally. Replaced the stub with a comment noting that the test is the contract.

## Verification

- `cargo test -p fluree-bench-support --lib` — **44 passed** (unchanged; M1's behavior change to `hashed_pair` is byte-identical to the pre-retrofit version, so the existing determinism + range tests still pass).
- `cargo test -p fluree-bench-support --test workspace_reconcile` — **1 passed**. Verified to fail correctly when a budget entry is removed (manual: deleted `query_hot_bsbm`, ran test, got "Missing budget entries: fluree-db-api/query_hot_bsbm" with the fix-it suggestion; restored entry, test passes).
- `cargo check --workspace --benches` — clean.
- `FLUREE_BENCH_PROFILE=quick FLUREE_BENCH_SCALE=tiny cargo bench --workspace -- --test` — **all 24 bench scenarios across 12 bench files report `Success`.**
- `python3 -c "import yaml; yaml.safe_load(...)"` on `.github/workflows/ci.yml` — clean.

## Coverage gaps tracked separately

The reviewer's "Suggested coverage gaps" section listed 8 perf-regression-prone hot paths with no dedicated bench. None blocks PR #1228 (the foundation), but several have known recent perf wins that warrant tracking. Spun out as GH issues:

- **#1229** — Property-join planner regression bench (highest-priority follow-up).
- **#1230** — Lex-sorted-string ORDER BY fast-path bench.
- **#1231** — Scan fast-path regression benches: batched object-join + group-count-firsts.
- **#1232** — Filter `PreparedBoolExpression` cache regression bench.
- **#1233** — Time-travel and multi-ledger `DataSetDb` history query coverage.
- **#1234** — Parser micro-benches (Turtle, JSON-LD, SPARQL).
- **#1235** — Parent tracker referencing all 6.
@aaj3f aaj3f marked this pull request as ready for review May 11, 2026 15:32
@aaj3f aaj3f requested review from bplatz and zonotope May 11, 2026 15:35
Copy link
Copy Markdown
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧑‍🔬

@aaj3f aaj3f merged commit d672df6 into main May 12, 2026
14 checks passed
@aaj3f aaj3f deleted the feature/stable-bench branch May 12, 2026 14:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants