Skip to content

fix: run the memory watchdog for every command; plug the connect() adapter leak#274

Merged
UtkarshBhardwaj007 merged 1 commit into
mainfrom
fix/memory-watchdog-default-adapter-leak
Jun 4, 2026
Merged

fix: run the memory watchdog for every command; plug the connect() adapter leak#274
UtkarshBhardwaj007 merged 1 commit into
mainfrom
fix/memory-watchdog-default-adapter-leak

Conversation

@UtkarshBhardwaj007
Copy link
Copy Markdown
Member

Incident

Three playground processes grew to 40+ GB each (45.8 / 42.1 / 41.7 GB in Activity Monitor) and swapped the laptop to a standstill; they had to be force-quit. The commands had looked finished but never returned to the shell.

Root cause

Two layers:

  1. connect() leaked its session-probe adapter. It creates a TerminalAdapter (statement-store WebSocket + subscriptions) to check for an existing session. On the existing-session path it returned plain address data without destroying the adapter — and since the adapter isn't part of the result, no caller could. The probe-failure path leaked it too, and on the QR path nobody owned login.adapter after the init TUI exited. getSessionSigner() and findSession() already destroy their probe adapters on early-return paths; connect() was the one that forgot.

  2. init had no memory watchdog. The leaked subscription machinery is exactly what can enter the documented polkadot-api microtask-flood state (see process-guard.ts), where the event loop is starved: signal handlers, hardExit timers, and index.ts's final process.exit() all stop firing. Ctrl+C and tab-close are dead; the frozen TUI frame looks "done". The worker-thread watchdog is the only guard that survives that state — and it only ran for deploy/mod/contract, not init/decentralise/build/update/logout.

Verified the watchdog itself works in compiled binaries: a probe SEA binary was SIGKILLed at 4.04 GB RSS, and the installed pg mod emits [mem +Ns] samples under DOT_MEMORY_TRACE=1.

Changes

  • src/utils/auth.tsconnect() destroys the probe adapter on the existing-session and probe-failure paths (fire-and-forget .destroy().catch(() => {}), same Bun-SEA rationale as getSessionSigner()); QR path still transfers ownership via LoginHandle
  • src/commands/init/index.ts — init owns login.adapter and destroys it in the finally after the TUI exits
  • src/cli-runtime.tswatchdog option defaults to true for every command; explicit opt-out preserved
  • src/utils/auth.connect.test.ts — new file pinning the adapter lifecycle contract (written failing-first: 2 of 3 cases failed pre-fix)
  • src/cli-runtime.test.ts — default-on + opt-out regression tests
  • CLAUDE.md — watchdog invariant updated to the new default
  • changeset (patch)

Verification

  • pnpm format:check, pnpm lint:license, pnpm test (584/584), pnpm build all pass
  • Rebuilt binary: playground init now emits watchdog samples under DOT_MEMORY_TRACE=1 (previously impossible — no watchdog) and still exits on its own after "setup complete"
  • Audited all createAdapter() call sites: every path now destroys or transfers exactly once; waitForLogin contains no destroy call, so no double-destroy

Caveat

The exact upstream allocation site of the flood was not reproduced live (idle init/mod stayed flat at ~300 MB over 30+ min probes). The defense is structural: if the state recurs, the process dies at 4 GB with a message pointing at DOT_MEMORY_TRACE=1, which will pin it.

…apter leak

Three playground processes grew past 40 GB each and froze the machine.
Two causes, both fixed:

1. connect() leaked its session-probe TerminalAdapter on the
   existing-session and probe-failure paths, and nobody owned the QR-path
   adapter after the init TUI exited. The leaked statement-store WebSocket
   + subscriptions keep the event loop alive and are the machinery that
   can enter the documented polkadot-api microtask-flood state.

2. The 4 GB memory watchdog only ran for deploy/mod/contract. The
   worker-thread watchdog is the only guard that survives event-loop
   starvation (signal handlers, hardExit timers, and index.ts's final
   process.exit all stop firing), so it now defaults ON for every
   command, with explicit opt-out preserved.

New regression coverage: src/utils/auth.connect.test.ts pins the adapter
lifecycle contract (destroy on existing/throw paths, ownership transfer
on the QR path); cli-runtime.test.ts pins the watchdog default.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Dev build ready — try this branch:

curl -fsSL https://raw.githubusercontent.com/paritytech/playground-cli/main/install.sh | VERSION=dev/fix/memory-watchdog-default-adapter-leak bash

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

E2E Test Pass · ✅ PASS

Tag: e2e-ci-pr · Branch: fix/memory-watchdog-default-adapter-leak · Commit: d770e19 · Run logs

Cell Result Time
pr-deploy-cdm ✅ PASS 3m30s
pr-deploy-foundry ✅ PASS 0m50s
pr-install ✅ PASS 0m51s
pr-deploy-frontend ✅ PASS 8m33s
pr-init-session ✅ PASS 1m56s
pr-mod ✅ PASS 1m35s
pr-preflight ✅ PASS 3m54s
${{ matrix.cell }} ⏭️ SKIP 0m00s
${{ matrix.cell }} ⏭️ SKIP 0m00s

Sentry traces: view spans for this run

@UtkarshBhardwaj007 UtkarshBhardwaj007 merged commit db18b8a into main Jun 4, 2026
20 checks passed
@UtkarshBhardwaj007 UtkarshBhardwaj007 deleted the fix/memory-watchdog-default-adapter-leak branch June 4, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant