chore(scripts): add concurrency_audit.sh — Phase 10a baseline tool by githubrobbi · Pull Request #303 · skyllc-ai/UltraFastFileSearch

githubrobbi · 2026-05-20T00:55:14Z

Summary

Phase 10a of the playbook Phase-10 effort (issue #302). Adds scripts/dev/concurrency_audit.sh — the workspace concurrency / async / shared-state baseline tool. Mirrors the shape of scripts/dev/build_codegen_audit.sh (Phase 9a).

What it does

Walks every workspace member and emits, per crate, the 7-dimension async/concurrency inventory called out by playbook §1082-1146 and Phase 10 plan §7:

tokio::spawn / detached tasks — every call-site for Phase-10c hand-audit.
Locks held across .await — literal .read/.write/.lock().await sites for Phase-10b hand-audit.
Blocking IO inside async — files containing both async fn AND std::fs::* / std::thread::sleep (Phase-10f candidates).
Arc<Mutex<…>> patterns — flat + multi-layer nesting (Arc<Mutex<Arc<…>>> separately flagged).
Missing timeouts — IO/network/IPC await sites (.connect/.read_exact/.write_all/.recv/.accept/...) that need a tokio::time::timeout enclosure.
Missing cancellation handling — spawn sites whose closure body (next 50 lines) lacks select! / CancellationToken / .cancelled() keywords.
Unbounded channels — every unbounded_channel() / broadcast::channel(...) site for Phase-10d backpressure audit.

Default mode runs in ~6 s (pure rg + awk; no cargo invocation). --with-cargo mode also runs cargo build --workspace --tests + cargo clippy --workspace --tests -- -W clippy::await_holding_lock as a Phase-10b enforcement-mode preview.

Baseline at this SHA (prod-only)

Dimension	Count	Phase that consumes it
`async fn` + async blocks	278	(info)
`tokio::spawn(` call sites	27	10c
`spawn_blocking` call sites	53	10f (verification)
`std::sync::Mutex/RwLock`	22	(info)
`tokio::sync::*` async locks	6	10b
`Arc<Mutex<…>>` / `Arc<RwLock<…>>`	1	(info — flat, no smell)
Bounded channels	5	(info)
Unbounded channels	4	10d
`tokio::time::timeout(` sites	10	10e
Lock-across-await candidate sites	36	10b (primary target)
Blocking-IO-in-async candidate files	14	10f
Spawn sites lacking nearby cancellation kw	24	10c
`#[tokio::test]` sites (test code)	127	(info)

The headline finding is 36 lock-across-await candidate sites in uffs-daemon (34) + uffs-mcp (2) — significantly more than the initial 7-site estimate in the plan recon. Each will be hand-audited in Phase 10b for "guard held across an inner .await" (the hazard) vs "guard acquired with .await and dropped before the next .await" (legitimate).

Detection caveats (documented in the script preamble + per-helper comments)

Lock-across-await uses literal regex. Multi-line guard-then-await patterns (let g = lock(); g.foo(); other.await; drop(g);) require Phase-10b hand-audit to confirm.
Cancellation-keyword regex uses word boundaries. Bare cancel would match cancel_tx false-positively; we require CancellationToken / cancellation_token / .cancelled() / is_cancelled / select! / abort_signal / recv_cancel.
Arc<Mutex<…>> output filters doc-comment lines. Rustdoc prose referencing the pattern doesn't inflate the count.
Counter uses grep -c '^.' (not grep -c .) so echo '' doesn't falsely count as 1 (this was caught + fixed during the script's own smoke-test).
Glob excludes use !**/tests/** (not bare !tests/**) — UFFS has in-tree test modules under src/.../tests/ which the bare pattern would not exclude.

Why this is Phase 10a (audit-tool first)

Mirroring the established Phase 6a / 7a / 8a / 9a cadence:

Build the audit tool first.
Run it to capture the precise baseline (output saved locally to docs/dev/baseline/2026-05-19/phase_10_concurrency_baseline.md).
Use the baseline numbers to seed the per-sub-phase hand-audits (10b lock-across-await, 10c task ownership, 10d backpressure, 10e timeout coverage, 10f blocking IO).
Phase 10g consumes all hand-audit outputs into concurrency_policy.md + per-crate # Concurrency rustdoc.

Rule-1 adherence

Zero #[allow] introductions. Script-level set -uo pipefail (matches sibling audit scripts). One # shellcheck-info-level note (SC2016 — 'single quotes don't expand') is intentional: the affected lines emit Markdown backticks literally, not shell variables.

Cross-references

Issue: [playbook-phase-10] Async, concurrency, and shared state discipline #302
Plan (local): docs/dev/architecture/code_clean/phase_10_async_concurrency_shared_state_implementation_plan.md
Sibling audit scripts: scripts/dev/build_codegen_audit.sh (Phase 9a — same shape), scripts/dev/feature_dep_audit.sh (Phase 8a), scripts/dev/trait_generic_audit.sh (Phase 7a), scripts/dev/clone_alloc_audit.sh (Phase 6a).
Playbook source: world_class_rust_workspace_refactor_playbook.md §1082-1146 (local-only).

Verification

bash -n scripts/dev/concurrency_audit.sh → SYNTAX OK
shellcheck scripts/dev/concurrency_audit.sh → only SC2016 info (intentional, see above)
scripts/dev/concurrency_audit.sh > /tmp/baseline.md → 316-line Markdown report in ~6 s
All 12 pre-push gates green (file-size, typos, reuse, cargo-check, lint-ci, lint-ci-no-default, lint-prod, lint-tests, rustdoc, doc-tests, tests, smoke, lint-ci-windows).

Next steps (queued)

10b — hand-audit the 36 lock-across-await candidate sites.
10c — hand-audit the 24 spawn-sites-without-cancellation-keywords.
10d — justify or convert the 4 unbounded channels.
10e — timeout coverage map.
10f — blocking-IO-in-async hand-audit (14 candidate files).
10g — concurrency_policy.md + per-crate # Concurrency rustdoc.
10h — CONTRIBUTING cross-link + final report + close [playbook-phase-10] Async, concurrency, and shared state discipline #302.

Walks every workspace member and emits, per crate, the 7-dimension async/concurrency inventory called out by playbook §1082-1146 and Phase 10 plan §7: 1. tokio::spawn / detached tasks — every call-site for hand-audit. 2. Locks held across .await — literal .read/.write/.lock().await sites for Phase-10b hand-audit. 3. Blocking IO inside async — files containing both async fn and std::fs::* / std::thread::sleep (Phase-10f candidates). 4. Arc<Mutex<...>> patterns — flat + multi-layer nesting. 5. Missing timeouts — IO/network/IPC await sites that need a tokio::time::timeout enclosure. 6. Missing cancellation handling — spawn sites whose closure body (next 50 lines) lacks select! / CancellationToken / cancelled() keywords. 7. Unbounded channels — every unbounded_channel() / broadcast::channel() site for Phase-10d backpressure audit. Mirrors the shape of scripts/dev/build_codegen_audit.sh (Phase 9a): shebang + SPDX header, --with-cargo flag (runs cargo build --tests + cargo clippy -W clippy::await_holding_lock), workspace-root detection, RG_PROD_GLOBS filter, Markdown report to stdout. Prod-only filter excludes test code (tests/, benches/, examples/, tests.rs / *_tests.rs / *_test.rs / test_*.rs). Note: the **/ recursive prefix is required for directory excludes because UFFS has in-tree test modules under src/.../tests/ (the canonical Rust pattern) in addition to top-level crates/*/tests/. Without the prefix, !tests/** would only match the top-level path. Detection caveats are documented inline: * Lock-across-await uses literal regex; multi-line guard-then-await patterns require Phase-10b hand-audit. * Cancellation-keyword regex uses word boundaries (e.g. requires CancellationToken / cancelled() / abort_signal, NOT bare 'cancel' which would match cancel_tx false-positively). * Arc<Mutex<...>> output filters doc-comment / block-comment lines so rustdoc prose doesn't inflate the count. * Counter uses grep -c '^.' (not grep -c .) so echo '' doesn't falsely count as 1. Baseline at SHA ff8b897 (Phase 10 entry — prod-only counts): * 278 async fn + blocks across 5 crates (daemon=132, mft=74, mcp=46, client=25, core=1). * 27 tokio::spawn( sites (daemon=23, mft=3, client=1). * 53 spawn_blocking sites (daemon=36, mft=15, core=1, client=1). * 22 std::sync::Mutex/RwLock + 6 tokio::sync::* + 1 Arc<Mutex<>>. * 36 lock-across-await candidate sites (daemon=34, mcp=2) — primary Phase-10b audit target. * 5 bounded + 4 unbounded channels — Phase-10d target. * 10 prod-code tokio::time::timeout sites — Phase-10e target. * 24 of 27 spawn sites lack nearby cancellation keywords — Phase-10c target. * 14 blocking-IO-in-async candidate files — Phase-10f target. Runs in ~6 s (pure rg + awk; no cargo invocation in default mode). Refs #302.

…s (Phase 10g) (#308) * docs(concurrency): add concurrency_policy.md + per-crate # Concurrency rustdoc Phase 10g closes the Phase 10 audit chain with a workspace concurrency contract doc and per-crate rustdoc sections summarizing each crate's runtime model. ## What lands * `docs/architecture/code-quality/concurrency_policy.md` — 7th companion to the existing 6 code-quality policy docs (panic, allocation, trait, dependency, build/codegen, lint-posture). Codifies the five concurrency dimensions: - Task ownership (T1/T2/T3 with required four-facet annotation). - Lock discipline (L1-L5 patterns; L6 lock-across-await forbidden; enforced by three `await_holding_*` clippy lints at deny). - Channel discipline (C1-C5; C6 undocumented-unbounded forbidden). - Timeout policy (W1-W4; W5 unbounded cross-process await forbidden). - Blocking-IO rule (B1-B4; B5 unbounded sync I/O on async runtime worker forbidden). Plus shutdown coordination, required annotation templates, per-crate posture matrix, verification steps, anti-patterns, and the Phase 10 audit trail cross-linking PRs #303-#307 and the local per-dimension audit docs in `docs/dev/baseline/2026-05-19/`. * `crates/uffs-daemon/src/lib.rs` `# Concurrency` rustdoc — runtime model + the six named `spawn_*` constructors that form the daemon's startup graph + cross-link to the policy doc. * `crates/uffs-mcp/src/lib.rs` `# Concurrency` rustdoc — stdio dispatcher / streamable-http axum server / daemon-bridge reader-loop ownership + reload-pipeline-is-CLI-one-shot note. * `crates/uffs-client/src/lib.rs` `# Concurrency` rustdoc — hybrid async + sync runtime model + the deliberate 300 s async vs 60 s env-overridable sync timeout asymmetry documented in `phase_10_timeout_coverage_audit.md`. * `crates/uffs-mft/src/lib.rs` `# Concurrency` rustdoc — predominantly sync library + CLI binary; tokio only for daemon-embedded `spawn_blocking` MFT reads. * `CONTRIBUTING.md` `## Concurrency policy` section — quotes the one-line rule, lists the five dimensions with their taxonomies, and cross-links the policy doc. ## Rule-1 adherence * Zero `#[allow(...)]` introductions. * No suppression hacks, no skipped tests. * Doc-only PR — no behavior change. * `cargo clippy --workspace --all-targets -- -D warnings` — clean. ## Follow-up A separate commit on this branch decomposes `crates/uffs-daemon/src/lib.rs` (currently 1066 LOC, over the 800-LOC file-size policy ceiling) into `tracing.rs`, `startup.rs`, and `shutdown.rs` sibling modules, keeping the spawn_* cluster cohesive per the existing rationale comment. Refs #302 (Phase 10 umbrella). * refactor(daemon): split lib.rs into log_init / startup / shutdown sibling modules `crates/uffs-daemon/src/lib.rs` was 1066 LOC (266 over the 800-LOC file-size policy ceiling). The existing rationale comment argued against splitting the `spawn_*` cluster because doing so would fragment the parent-task lifetime relationships — and that argument still holds. But the orchestrator's *setup* and *terminal* phases have no such coupling and can move out cleanly. ## Extractions * `crates/uffs-daemon/src/log_init.rs` (105 LOC, new) — `init_tracing` + `default_log_file`. No collision with the `tracing` crate because the module is named `log_init`. `pub use log_init::init_tracing;` re-export preserves the public API exactly (`uffs_daemon::init_tracing` still works for the daemon binary + the embedded `uffs daemon run` subcommand). * `crates/uffs-daemon/src/startup.rs` (238 LOC, new) — pre-spawn helpers: - `install_catastrophe_panic_hook` - `log_daemon_starting` + `emit_daemon_starting_event` - `load_daemon_config` - `bootstrap_lifecycle_manager` - `gather_mft_files` + `drive_letter_matches` + `resolve_drive_list` (windows + non-windows variants) - `validate_data_sources` All `pub(crate)`. `drive_letter_matches` is `pub(crate)` (was private) because the existing regression-pin test in `crates/uffs-daemon/src/tests.rs` exercises it directly — the contract pin was preserved by updating the import to `super::startup::drive_letter_matches;`. * `crates/uffs-daemon/src/shutdown.rs` (84 LOC, new) — `await_shutdown_then_force_exit` + `force_exit_with_watchdog`. All `pub(crate)`. ## lib.rs after * **689 LOC** (was 1066) — under the 800-LOC ceiling without an exception. * Keeps the `spawn_*` cluster cohesive: `spawn_load_task` / `spawn_ipc_servers` / `spawn_stats_heartbeat` / `spawn_idle_demote_controller` / `spawn_journal_loops_for_warm_shards` / `spawn_pressure_subscriber`. * Keeps `DaemonConfig` (public struct) and `run_daemon` (public orchestrator) for binary + embedded use. * Removed the obsolete `Exception: file_size_policy allows this file to exceed 800 LOC` rationale comment from the crate-root rustdoc. ## file_size_exceptions.txt Removed the now-obsolete `crates/uffs-daemon/src/lib.rs|PERMANENT: …` entry. Local `bash scripts/ci/check_file_size_policy.sh` passes clean. ## Rule-1 adherence * Zero `#[allow(...)]` introductions. * No suppression hacks, no skipped tests. * No public API change — `uffs_daemon::{init_tracing, run_daemon, DaemonConfig}` resolve to the same items as before via the `pub use log_init::init_tracing;` re-export. * `cargo fmt --all` — clean. * `cargo clippy --workspace --all-targets -- -D warnings` — clean. * `cargo test -p uffs-daemon --lib` — **298 passed / 0 failed**. * `bash scripts/ci/check_file_size_policy.sh` — clean. Refs #302 (Phase 10 umbrella). * docs(concurrency): close Phase 10 gaps — §0 model + §6 registry + named clippy entries + audit-script speedup Closes the 4 gaps surfaced by the Phase 10 fidelity audit against the playbook §1142-1146 pass criteria + plan §2 acceptance criteria: ## AC3 — `§0 The model at a glance` (was missing) Adds the playbook's headline pass criterion ("concurrency model can be explained on one page") to `concurrency_policy.md`: * **Daemon task graph** — Mermaid diagram with 11 nodes covering the 6 top-level `spawn_*` constructors + 2 subsystem-internal spawns (per-shard journal loop + `RegistryPatchSink`) + per-connection `handle_connection` + writer/notification sub-tasks. * **Shard-state lifecycle table** — 6 states (`Unknown`, `Cold`, `Parked`, `Warm`, `Hot`, `Evicting`) with bloom/trie/body presence + entry triggers; legal transitions cross-link `ShardState::can_transition_to`. Two demote drivers explained: idle-TTL (`spawn_idle_demote_controller`, 30 s cadence) + memory-pressure cascade (`spawn_pressure_subscriber`, no-op on Mac/Linux). * **IPC-request lifecycle** — 4 numbered steps: accept → per-connection task → reader/writer/notifier → per-RPC timeout (search 30 s, drive-load `IndexManager::DRIVE_LOAD_TIMEOUT`, refresh fire-and-forget). * **Shutdown sequence** — 5 numbered steps: signal source → IPC drain → load drain (3 s timeout) → PID + socket cleanup → force-exit watchdog (5 s) + `process::exit(0)`. Word count: ~520 prose words + 1 Mermaid + 1 sub-table — under the 600-word budget per plan §3 risk register. ## AC2b + AC6 — `§6 Spawn-site registry` (was missing) Adds the workspace-tracked enumeration of all 18 prod `tokio::spawn(` sites with the four required facets (owner / shutdown / errors / cancel) — closes plan AC#2's "full inventory in `concurrency_policy.md §"Spawn-site registry"`" + AC#6's "all 7 required sections". The table is keyed by group (A = top-level, B = subsystem long-lived, C = per-connection, D = sub-task, E = one-shot, F = runtime-cleanup / external) and references the source-of-truth hand-audit at `docs/dev/baseline/2026-05-19/phase_10_task_ownership_inventory.md` (local-only) for per-site nuance. Adding a new prod spawn site now requires a matching row in the same PR — the audit-script `§2` count is the gate; new spawn with no registry row fails review. Renumbering: §6 (was Verification) → §7, §7 → §8, §8 → §9 to preserve narrative flow (policy → posture → registry → verification → anti-patterns → audit trail). ## G1 — Named `await_holding_*` clippy entries Pins all 3 lints from clippy's `suspicious` group at deny in the workspace `Cargo.toml`: ```toml await_holding_lock = "deny" await_holding_refcell_ref = "deny" await_holding_invalid_type = "deny" ``` Effectively no behavior change — they were at warn-by-default, promoted to error via the workspace's CI `-D warnings` flag. The named entries make the policy doc's enforcement claim literal and survive any future tightening of the `--deny warnings` shape. An inline comment block at the entries documents the rationale + cross-links `concurrency_policy.md §1`. Also fixes a leaked absolute path in the policy doc (line 36 was `@/Users/rnio/Private/Github/UltraFastFileSearch/Cargo.toml` — now just `Cargo.toml`). ## AC5 — Audit script `< 5 s` (was 6.2 s) Adds `bulk_per_crate` helper to `scripts/dev/concurrency_audit.sh` that runs **one** `rg` invocation per pattern across all of `crates/` and buckets per-file counts into per-crate totals via `awk` + a bash nameref (`local -n`). Replaces the prior N-crates × M-patterns nested loop (~126 rg invocations for §1 alone, ~1.5 s of process-spawn overhead). Wall-time impact: **6.2 s → 4.5 s** (verified locally). Output is byte-identical (modulo the `Captured:` ISO timestamp). Requires bash 4.3+ for nameref — script shebang is `#!/usr/bin/env bash`; macOS `brew bash` ships 5.x; CI runners are Ubuntu 22.04+ with bash 5.x by default. ## Audit trail (§9) update Updates `concurrency_policy.md §9` to mention 10g's bonus deliverables (daemon `lib.rs` decomposition, §0 model, §6 registry, named clippy entries) + adds 10h row pointing at the local-only final report. ## Final report (G3, local-only) Adds `docs/dev/baseline/2026-05-20/phase_10_final_report.md` — mirrors `phase_9_final_report.md` pattern. ~290 LOC covering: * §0 Executive summary * §1 Per-dimension outcome table (5 dimensions + cross-cutting cancellation infra + Arc<Mutex<…>> nesting depth) * §2 Concrete change summary (10 numbered items across the 6 PRs) * §3 Acceptance scorecard (8 of 8 ACs + 3 of 3 playbook P-criteria green) * §4 Test coverage audit (G2 verdict — no real gap; existing 298 unit + 4 integration + 13 journal-loop tests cover every spawn-site cancellation path; soak/stress deferred to Phase 12 per plan §0.3) * §5 Risk-register outcomes (6 of 6 mitigations consumed without incident) * §6 Cross-references * §7 Verification at closing SHA * §8 Decisions log (10 entries 10a-10h) * §9 Phase 10 closure with #302 close note Local-only (gitignored under `docs/dev/baseline/`); mirrors the Phase 0-9 final-report cadence. ## Rule-1 adherence * Zero `#[allow(...)]` introductions. * No suppression hacks, no skipped tests, no lint disables — the 3 new clippy entries are *deny*, not allow. * No public API change. * `cargo fmt --all` — clean. * `cargo clippy --workspace --all-targets -- -D warnings` — clean (now also enforcing the 3 named `await_holding_*` lints). * `cargo test -p uffs-daemon --lib` — 298 passed / 0 failed. * `RUSTDOCFLAGS="-D warnings -D rustdoc::broken-intra-doc-links" cargo doc --workspace --no-deps` — clean. * `bash scripts/dev/concurrency_audit.sh` — runs in 4.5 s, output byte-identical to pre-optimization. Refs #302 (Phase 10 umbrella).

githubrobbi merged commit 2144a58 into main May 20, 2026
19 checks passed

githubrobbi deleted the chore/concurrency-audit-phase-10a branch May 20, 2026 01:10

This was referenced May 20, 2026

fix(concurrency): snapshot lock state before awaits in status RPC + MCP dispatch #304

Merged

docs(concurrency): add concurrency_policy.md + decompose daemon lib.rs (Phase 10g) #308

Merged

githubrobbi mentioned this pull request May 21, 2026

[playbook-phase-10] Async, concurrency, and shared state discipline #197

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(scripts): add concurrency_audit.sh — Phase 10a baseline tool#303

chore(scripts): add concurrency_audit.sh — Phase 10a baseline tool#303
githubrobbi merged 1 commit into
mainfrom
chore/concurrency-audit-phase-10a

githubrobbi commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

githubrobbi commented May 20, 2026

Summary

What it does

Baseline at this SHA (prod-only)

Detection caveats (documented in the script preamble + per-helper comments)

Why this is Phase 10a (audit-tool first)

Rule-1 adherence

Cross-references

Verification

Next steps (queued)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant