fix(mcp): Harden lifecycle teardown and observability by cameroncooke · Pull Request #274 · getsentry/XcodeBuildMCP

cameroncooke · 2026-03-14T23:33:48Z

Harden MCP teardown and add lifecycle observability for process-leak diagnosis.

Issue #273 reports orphaned MCP processes and very high memory growth after client shutdown in some client paths. This PR makes teardown deterministic by wiring shutdown handling earlier in startup, forcing bounded shutdown on stdio disconnect, and explicitly stopping lifecycle watchers during teardown instead of relying on process exit timing.

I also added lifecycle diagnostics so recurrence can be identified remotely: process age, RSS, and peer MCP-process counts are now recorded in MCP lifecycle telemetry and exposed in session diagnostics. This gives us concrete signals in Sentry when shutdown stalls or abnormal process growth appears.

I considered leaving shutdown fully graceful-only, but that keeps the failure mode where teardown can stall indefinitely under client disconnect races. The bounded shutdown path is intentionally narrow and only used for disconnect/crash-style termination paths.

This branch also upgrades @modelcontextprotocol/sdk to ^1.27.1 so validation runs on the latest SDK behavior.

Fixes #273

Attach MCP shutdown handlers before async startup, add broken-pipe and crash handling, and explicitly stop the Xcode watcher during teardown. This closes lifecycle gaps that could leave MCP processes running after clients disappear and adds concrete shutdown reasons for investigation. Add lifecycle Sentry metrics, richer session diagnostics, and a repro script so process age, peer counts, memory, and active runtime state are visible when this issue recurs in the wild. Fixes #273 Co-Authored-By: Codex <noreply@openai.com>

pkg-pr-new · 2026-03-14T23:34:51Z

Open in StackBlitz

npm i https://pkg.pr.new/xcodebuildmcp@274

commit: 28bd7c4

src/server/mcp-lifecycle.ts

When the MCP parent process exits abruptly with in-flight requests, shutdown can block indefinitely waiting for server.close(). This keeps npm exec wrapper processes alive and allows the child node process to climb in RSS. Add a transport-disconnect fast-exit path that detaches lifecycle handlers, races server.close() against a short timeout, and then exits immediately. This preserves normal shutdown behavior for other reasons and keeps Sentry enabled. Also add focused unit tests for the bounded close helper outcomes. Fixes #273

sentry · 2026-03-15T20:14:08Z

src/server/start-mcp-server.ts

+      if (isTransportDisconnectReason(reason)) {
+        lifecycle.detachProcessHandlers();
+        await __closeServerForFastExitForTests(closeableServer);
+        process.exit(exitCode);


Bug: For transport disconnects, the server's new fast-exit path calls process.exit() before flushAndCloseSentry(), risking the loss of diagnostic logs that have not yet been sent.
_{Severity: MEDIUM}

Suggested Fix

To ensure diagnostic data is preserved, consider replacing the immediate process.exit() with a mechanism that waits for the Sentry transport to flush. One option is to await flushAndCloseSentry() before exiting. If the current fast-exit is a deliberate trade-off to prevent hanging processes, this change should be carefully evaluated against that risk.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/server/start-mcp-server.ts#L101-L104 Potential issue: In the `onShutdown` handler, a new "fast-exit" path was added for transport disconnect events like `stdin-end` and `stdin-close`. This path calls `process.exit()` at line 104. Consequently, the Sentry flushing logic at line 175 (`flushAndCloseSentry()`) is never reached for these common shutdown scenarios. While lifecycle data is captured in a log message with a `{ sentry: true }` flag on line 97, the process may terminate before this log is successfully transmitted to Sentry, leading to a potential loss of critical diagnostic data intended to be captured by this pull request.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

src/server/start-mcp-server.ts

Update @modelcontextprotocol/sdk from ^1.25.1 to ^1.27.1 so local and harness testing run on the latest SDK behavior. Add /repros to .gitignore to keep lifecycle repro artifacts out of source control during repeated leak investigations. Refs #273 Co-Authored-By: Codex <noreply@openai.com>

sentry · 2026-03-15T21:03:03Z

src/server/mcp-lifecycle.ts

+  const handleUncaughtException = (error: unknown): void => {
+    void coordinator.shutdown('uncaught-exception', error);
+  };


Bug: Process event handlers like handleUncaughtException call coordinator.shutdown() without await, causing the process to exit before async cleanup and telemetry flushing completes.
_{Severity: HIGH}

Suggested Fix

Modify the process event handlers to ensure the process does not exit until the asynchronous coordinator.shutdown() promise resolves. This may involve using a synchronous IPC channel to signal completion to a parent process or using a library that facilitates synchronous waiting on async operations in exit handlers, if possible. The goal is to block the handler's return until cleanup is finished.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/server/mcp-lifecycle.ts#L340-L342 Potential issue: Process event handlers for `uncaughtException`, `unhandledRejection`, `SIGTERM`, and `SIGINT` are synchronous and call the `async` function `coordinator.shutdown()` in a fire-and-forget manner. According to Node.js behavior, the process will exit immediately after these synchronous handlers complete, without waiting for any asynchronous operations they initiate. This creates a race condition where the process terminates before the critical asynchronous shutdown logic, including `flushAndCloseSentry()`, can finish. As a result, crash telemetry, logs, and other cleanup operations will be lost, defeating the purpose of adding remote diagnostics.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-15T21:05:55Z

src/server/start-mcp-server.ts

+        lifecycle.detachProcessHandlers();
+        await __closeServerForFastExitForTests(closeableServer);
+        process.exit(exitCode);
+      }


Fast-exit path skips Sentry flush losing telemetry

Medium Severity

The transport-disconnect fast-exit path calls process.exit() without calling flushAndCloseSentry() or recordMcpLifecycleMetric(). Since process.exit() doesn't trigger async flush handlers, the lifecycle telemetry added by this PR to diagnose orphaned processes won't reach Sentry for the most common shutdown path (client closing stdin).

Additional Locations (2)

src/server/start-mcp-server.ts#L106-L116

src/server/start-mcp-server.ts#L173-L176

cursor bot reviewed Mar 14, 2026

View reviewed changes

src/server/mcp-lifecycle.ts Show resolved Hide resolved

sentry bot reviewed Mar 15, 2026

View reviewed changes

cursor bot reviewed Mar 15, 2026

View reviewed changes

src/server/start-mcp-server.ts Show resolved Hide resolved

cameroncooke and others added 2 commits March 15, 2026 20:56

Update mcp.json

28bd7c4

cameroncooke changed the title ~~fix(mcp): Harden server lifecycle shutdown~~ fix(mcp): Harden lifecycle teardown and observability Mar 15, 2026

sentry bot reviewed Mar 15, 2026

View reviewed changes

cursor bot reviewed Mar 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(mcp): Harden lifecycle teardown and observability#274

fix(mcp): Harden lifecycle teardown and observability#274
cameroncooke wants to merge 4 commits intomainfrom
cameroncooke/fix/mcp-lifecycle-process-leak

cameroncooke commented Mar 14, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

sentry bot Mar 15, 2026

Uh oh!

Uh oh!

sentry bot Mar 15, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cameroncooke commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sentry bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sentry bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 15, 2026

Choose a reason for hiding this comment

Fast-exit path skips Sentry flush losing telemetry

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cameroncooke commented Mar 14, 2026 •

edited

Loading

pkg-pr-new bot commented Mar 14, 2026 •

edited

Loading