feat(orchestrator): RAII WorktreeGuard for cancellation-safe compound review (Layer 1) Refs #1569 (Gitea)#874
Merged
Merged
Conversation
added 2 commits
May 17, 2026 14:55
…d review Refs #1569
Layer 1 of the worktree-lifecycle epic (Gitea #1567). Owns three
property regressions surfaced by the bigbox storm:
1. WorktreeGuard now optionally invokes `git worktree remove --force`
on Drop (new `for_managed` constructor). The synchronous std
Command is intentional -- Drop cannot be async, and the git CLI
call is sub-second. Falls back to filesystem removal on non-zero
exit or when the git binary is not invokable.
2. `WorktreeManager::create_worktree` now returns the guard instead
of a bare PathBuf. The per-agent caller path
(`worktree_guard::WorktreeGuard::new`) is unchanged.
3. `CompoundReviewWorkflow::run` declares `guard` BEFORE `tasks:
JoinSet`, so locals drop in reverse: JoinSet drops first (aborts
every wrapping task), Child::Drop with `kill_on_drop(true)` kills
each subprocess synchronously, THEN the guard drops, reconciling
the `<repo>/.git/worktrees/<name>` admin entry. The explicit
`worktree_manager.remove_worktree(...)` cleanup at end-of-run is
removed; the guard owns it.
4. `kill_on_drop(true)` is now set on `Command::new(cli_tool)` in
`run_single_agent`. Without this, JoinSet abort orphans the
subprocess against `Child::Drop`.
A `WORKTREE_REVIEW_PREFIX = "review-"` constant is added at module
scope of `scope.rs` as the single source of truth referenced by
Layer 2 (`sweep_stale`) and Layer 3 (`adf-cleanup.sh`).
Tests:
- Three new unit tests on the managed guard
(`test_managed_guard_invokes_git_remove`,
`test_managed_guard_fallback_on_git_failure`,
`test_managed_guard_keep_disarms`).
- Existing scope::tests updated to .keep() guards where they need
to inspect or remove via the manager.
- New integration test `tests/compound_cancellation_test.rs` with
two property tests:
* `test_cancellation_leaves_no_worktree` captures sleep-agent
PIDs BEFORE abort (so cwd readlinks resolve the live path, not
`<path> (deleted)`), then asserts every PID is gone within 2 s
along with the worktree dir and .git/worktrees admin entry.
Empirically verified that without `kill_on_drop` this test
fails with "agent subprocess(es) survived cancellation".
* `test_storm_cancellation_leaves_no_worktree` spawns two
parallel `workflow.run` calls and aborts both. (Deviation from
the brief, which specified replaying the Layer 0 cursor bug
by calling `check_cron_schedules` twice. The Layer 0 fix is
landed in main; the storm property checked here is the same
in spirit: storm-shaped cancellation must drain to zero.)
A new feature flag `test-helpers` exposes
`scope::test_support::setup_git_repo` to the integration test;
this is gated so the helper does not appear in the production
build. tempfile becomes an optional dep behind the same gate.
The integration test itself is `#![cfg(all(unix, feature =
"test-helpers"))]` so workspace-wide `cargo check --all-targets`
(without features) still compiles.
Deviations from the design:
- Tests use `&Path` for the cli_tool, written into a TempDir-scoped
shell wrapper that ignores its args and execs `/bin/sleep 999`.
The brief specified `/bin/sleep 30`, but sleep itself rejects
non-numeric trailing args from the dispatcher, so a wrapper is
required to keep the subprocess alive long enough to validate
cancellation.
- PID capture uses pre-abort `/proc/<pid>/cwd` snapshots rather
than oneshot-channel-based capture (which would have required
refactoring `run_single_agent` from `Command::output()` to
`spawn() + wait_with_output()` solely for testability). Approach
(b) from the brief's risks list.
…efs #1569 Add explicit `[[test]] required-features = ["test-helpers"]` so that `cargo test --all-features` picks up `compound_cancellation_test` even though the source file is also `cfg`-gated for defensive double-protection. This is a follow-on to 6ba6331 and does not change behaviour: - Default `cargo test -p terraphim_orchestrator` still excludes the integration test (same as before). - `cargo test -p terraphim_orchestrator --features test-helpers` still runs it (same as before). - New: `cargo test -p terraphim_orchestrator --all-features` now also runs it. Note: the repo's CI command in `.github/workflows/ci-pr.yml` is `cargo nextest run --workspace --profile ci --lib --bins --features zlob`. Integration tests under `tests/` are not executed by that gate regardless of feature flags. The cancellation test is therefore a developer-run regression gate today; promoting it into CI is a separate ticket.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Layer 1 of the ADF worktree-lifecycle hardening epic (Gitea #1567). The primary technical fix: convert the compound-review worktree-cleanup line at
compound.rs:367-370(which leaks on future cancellation) into RAII viaWorktreeGuard. Addskill_on_drop(true)at the agent-subprocesstokio::process::Commandand refactors the swarm spawn loop fromtokio::spawn+mpsctotokio::task::JoinSetso dropping the parent future actually kills the subprocesses.Key design points (per
docs/design/adf-worktree-lifecycle-design.md§4.2):WorktreeGuard::for_managed(repo_path, worktree_path)constructor;Droprunsgit -C <repo> worktree remove --force <path>synchronously withstd::fs::remove_dir_allfallback. The existingWorktreeGuard::newfilesystem-only path is preserved for the per-agent caller atlib.rs:5392-5438.WorktreeManager::create_worktreenow returnsResult<WorktreeGuard, _>.pub const WORKTREE_REVIEW_PREFIX: &str = "review-";-- single source of truth Layer 2 will consume.let _guard = ...declared FIRST so it drops LAST;let mut tasks: JoinSet<_>declared SECOND so it drops FIRST. Inverting recreates the bigbox storm race. Documented in-code via a multi-line comment.cmd.kill_on_drop(true)added to thetokio::process::Command::new(cli_tool)builder.Test plan
cargo test -p terraphim_orchestrator --features test-helpers --test compound_cancellation_test-- 2 tests pass:test_cancellation_leaves_no_worktreeandtest_storm_cancellation_leaves_no_worktree. Discriminating-verified by togglingkill_on_drop(true)off -- test correctly fails withagent subprocess(es) survived cancellation.cargo test -p terraphim_orchestrator --features test-helpers-- 684 lib tests + all integration suites pass.cargo clippy -p terraphim_orchestrator --lib --tests --features test-helpers -- -D warnings-- clean.Design deviations (documented in commit body)
setup_git_repoexposed via a newtest-helpersfeature flag (gatingtempfileas optional dep and thescope::test_supportmodule) instead ofpub(crate). Integration tests live in a separate crate sopub(crate)is insufficient. Cost: rust-analyzer needscargo.features = ["test-helpers"]enabled to compile the test file cleanly; documented for follow-uprust-analyzer.toml./bin/sleep 999(the dispatcher appends positional args which baresleeprejects as non-numeric)./proc/<pid>/cwdsymlinks BEFORE abort (so readlinks resolve the live path, not<path> (deleted)); the post-abort check asserts/proc/<pid>/disappears.Refs terraphim/terraphim-ai#1569 (Gitea)
Refs terraphim/terraphim-ai#1567 (Gitea epic)