Skip to content

fix(ssh): fix reconnect issues and node binary path resolution#2504

Merged
juliusmarminge merged 7 commits intomainfrom
t3code/11970999
May 8, 2026
Merged

fix(ssh): fix reconnect issues and node binary path resolution#2504
juliusmarminge merged 7 commits intomainfrom
t3code/11970999

Conversation

@juliusmarminge
Copy link
Copy Markdown
Member

@juliusmarminge juliusmarminge commented May 4, 2026

Summary

Fixes SSH launcher failure modes reported from remote Linux hosts:

  • bootstrap now recovers Node in non-interactive sh -s SSH sessions before running remote Node helper scripts or the T3 runner
  • Node discovery handles common install locations and version managers, including Volta, asdf, mise, fnm, nodenv, and nvm
  • the standalone SSH launcher accepts the required Node engine range as an option; the desktop app passes apps/server/package.json's engines.node range and the selected remote Node is validated before launch
  • reconnect after desktop app updates no longer reuses a launcher-managed remote server as if it were external; it stops stale managed processes and clears SSH state before relaunching
  • launch and pairing output parsing now tolerates shell startup noise before the final JSON payload
  • REMOTE.md now documents SSH launch troubleshooting, Node requirements, version-manager resolution, and reconnect cleanup behavior

Closes #2534

Root Cause

The remote bootstrap ran under non-interactive sh, so hosts with Node exposed only through shell setup from a version manager could fail with node: command not found during port picking/readiness. Even when a Node binary was found, the launcher did not check that it satisfied the server package's runtime engine range before starting helper scripts and the server. Separately, the shared remote runtime file could make a launcher-managed process look like an external server before runner-change restart logic ran, leaving users stuck after updates unless they manually killed remote t3code state.

Validation

  • bun fmt
  • bun lint (passes with existing warnings)
  • bun typecheck
  • bun run --filter @t3tools/ssh test

Note

Medium Risk
Medium risk: changes the SSH bootstrap/launch scripts and remote server reuse logic, which can affect connectivity and remote process lifecycle across diverse host environments. Also replaces version-comparison utilities used for provider/CLI gating, which could alter upgrade/advisory behavior if edge cases differ.

Overview
Hardens desktop-managed SSH launch/reconnect on remote hosts. The SSH bootstrap now tries to locate/activate a compatible remote node in non-interactive sh sessions (common paths + Volta/asdf/mise/fnm/nodenv/nvm), and can enforce a required Node engine range before running helper scripts or starting t3.

Launch/pairing output decoding is made resilient to shell startup noise by extracting the JSON object from stdout before decoding. Reconnect behavior is adjusted to avoid treating stale launcher-managed servers as “external” by stopping them and clearing/refreshing saved PID/port state.

Separately, a shared @t3tools/shared/semver module is introduced (with tests) and replaces the server’s custom cliVersion comparator; extractJsonObject is moved into @t3tools/shared/schemaJson and reused by text-generation and SSH parsing. REMOTE.md gains detailed SSH launch troubleshooting and Node requirements.

Reviewed by Cursor Bugbot for commit c4be90c. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Fix SSH reconnect issues and Node binary path resolution for remote launch

  • Remote launch and runner scripts now bootstrap Node onto PATH for non-interactive shells by probing common install locations and initializing version managers (volta, asdf, mise, fnm, nodenv, nvm) via REMOTE_NODE_ENV_SCRIPT in tunnel.ts
  • Adds optional Node engine range enforcement: a self-contained satisfiesSemverRange implementation (valid plain JS for remote execution) is stringified and sent to the remote host to validate the Node version before proceeding
  • SSH launch and pairing token output parsing now tolerates shell startup noise by extracting JSON from mixed stdout via a new decodeRemoteJsonOutput helper
  • Consolidates semver utilities into a new @t3tools/shared/semver package, replacing local compareCliVersions in ClaudeProvider, OpenCodeProvider, and providerMaintenance
  • Moves extractJsonObject to @t3tools/shared/schemaJson, removing duplication from TextGenerationUtils

Macroscope summarized c4be90c.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d1656d40-cd49-4d1b-9d28-920181e5a609

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch t3code/11970999

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added size:M 30-99 changed lines (additions + deletions). vouch:trusted PR author is trusted by repo permissions or the VOUCHED list. labels May 4, 2026
@juliusmarminge juliusmarminge changed the title [codex] fix SSH launcher reconnects fix(ssh): fix reconnect issues May 4, 2026
@github-actions github-actions Bot added size:L 100-499 changed lines (additions + deletions). and removed size:M 30-99 changed lines (additions + deletions). labels May 4, 2026
@juliusmarminge juliusmarminge changed the title fix(ssh): fix reconnect issues fix(ssh): fix reconnect issues and node binary path resolution May 4, 2026
Comment thread packages/ssh/src/tunnel.ts Outdated
Comment thread packages/ssh/src/tunnel.ts
Jgratton24 added a commit to Jgratton24/t3code that referenced this pull request May 6, 2026
Pre-flight now reads `apps/server`'s `engines.node` and rejects WSL
distros whose Node version is too old, naming the actual version and
the required range. Without this we'd let an outdated Node fall through
to node-gyp/runtime errors that don't mention version at all.

The semver-range check is ported from the JS used by the SSH launcher's
remote node check in PR pingdotgg#2504. When that PR merges (or this one does),
whichever lands second extracts the helper into `packages/shared` so
both call sites can drop their copy.
Co-authored-by: codex <codex@users.noreply.github.com>
Co-authored-by: codex <codex@users.noreply.github.com>
- Parse remote launch and pairing JSON from the last JSON suffix
- Add regression coverage for pretty-printed pairing output after shell noise
@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented May 8, 2026

Approvability

Verdict: Needs human review

This PR introduces substantial SSH launch behavior changes including node version manager detection and reconnect handling. Three unresolved review comments identify potential bugs: JSON extraction may fail with certain shell output, misleading error messages, and risk of killing external servers. These substantive issues warrant human review.

You can customize Macroscope's approvability policy. Learn more.

- Move version comparison and JSON extraction into `@t3tools/shared`
- Reuse shared semver checks in provider and SSH maintenance code
- Add tests for semver, JSON extraction, and tunnel script generation
@github-actions github-actions Bot added size:XL 500-999 changed lines (additions + deletions). and removed size:L 100-499 changed lines (additions + deletions). labels May 8, 2026
return yield* Effect.fail(error);
}),
),
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON extraction finds first brace, not last payload

Medium Severity

decodeRemoteJsonOutput falls back to extractJsonObject which finds the first { in stdout, not the last JSON payload. If shell startup noise from version managers contains any { character (e.g., nvm: {default} or asdf: update {notice}), the extraction grabs the wrong substring, decoding fails, and the valid JSON payload later in the output is never attempted — causing a spurious SSH launch or pairing failure.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 394dc29. Configure here.

if ! ensure_remote_node_path; then
printf 'Remote host is missing node on PATH. Install Node or configure a supported version manager for non-interactive shells.\\n' >&2
exit 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading error when node version fails range check

Low Severity

When ensure_remote_node_path fails because the discovered node doesn't satisfy the engine range (not because node is missing), the launch script prints "Remote host is missing node on PATH" which contradicts the earlier stderr message from remote_node_satisfies_engine saying the node version doesn't satisfy the required range. Users troubleshooting SSH launch failures would see conflicting guidance.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 394dc29. Configure here.

- Keep the default remote port available when clearing a managed tunnel
- Update restart logic to rewrite port and managed markers instead of removing them
- Adjust script test coverage for the new state handling
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Caret operator incorrect for semver major version zero
    • Added special-case handling for the caret operator when major is zero: pins minor when major=0 and minor!=0, and pins patch when both major=0 and minor=0, matching the semver specification.

Create PR

Or push these changes by commenting:

@cursor push 6d301562d5
Preview (6d301562d5)
diff --git a/packages/shared/src/semver.ts b/packages/shared/src/semver.ts
--- a/packages/shared/src/semver.ts
+++ b/packages/shared/src/semver.ts
@@ -188,7 +188,20 @@
         const operator = match[1] || "=";
         switch (operator) {
           case "^":
-            return version.major === target.major && compared >= 0;
+            if (target.major !== 0) {
+              return version.major === target.major && compared >= 0;
+            }
+            if (target.minor !== 0) {
+              return (
+                version.major === target.major && version.minor === target.minor && compared >= 0
+              );
+            }
+            return (
+              version.major === target.major &&
+              version.minor === target.minor &&
+              version.patch === target.patch &&
+              compared >= 0
+            );
           case ">=":
             return compared >= 0;
           case ">":

You can send follow-ups to the cloud agent here.

Comment thread packages/shared/src/semver.ts Outdated
Comment thread packages/ssh/src/tunnel.ts
- Correct caret range handling for 0.x and 0.0.x versions
- Reject non-positive runtime PIDs in the SSH tunnel script
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Fallback PID may kill external server unintentionally
    • Removed the ${REMOTE_PID:-$DEFAULT_RUNTIME_PID} fallback so PID_TO_STOP is set to $REMOTE_PID directly, preventing an empty REMOTE_PID from resolving to the external server's PID.

Create PR

Or push these changes by commenting:

@cursor push a2607dec13
Preview (a2607dec13)
diff --git a/packages/ssh/src/tunnel.ts b/packages/ssh/src/tunnel.ts
--- a/packages/ssh/src/tunnel.ts
+++ b/packages/ssh/src/tunnel.ts
@@ -546,7 +546,7 @@
   REMOTE_PORT="$DEFAULT_REMOTE_PORT"
   if wait_ready "@@T3_REUSE_READY_TIMEOUT_MS@@"; then
     if [ "$REMOTE_MANAGED" = "managed" ]; then
-      PID_TO_STOP="\${REMOTE_PID:-$DEFAULT_RUNTIME_PID}"
+      PID_TO_STOP="$REMOTE_PID"
       if [ -n "$PID_TO_STOP" ] && kill -0 "$PID_TO_STOP" 2>/dev/null; then
         kill "$PID_TO_STOP" 2>/dev/null || true
         wait_for_pid_exit "$PID_TO_STOP"

You can send follow-ups to the cloud agent here.

Reviewed by Cursor Bugbot for commit c4be90c. Configure here.

kill "$REMOTE_PID" 2>/dev/null || true
wait_for_pid_exit "$REMOTE_PID"
if [ "$REMOTE_MANAGED" = "managed" ]; then
PID_TO_STOP="\${REMOTE_PID:-$DEFAULT_RUNTIME_PID}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback PID may kill external server unintentionally

Medium Severity

When REMOTE_MANAGED is "managed" but REMOTE_PID is empty (e.g., PID file was deleted or corrupted), PID_TO_STOP falls back to DEFAULT_RUNTIME_PID via "${REMOTE_PID:-$DEFAULT_RUNTIME_PID}". DEFAULT_RUNTIME_PID comes from server-runtime.json and may represent an externally started server. Killing it defeats the purpose of adopting an external server, as the subsequent wait_ready on that port will fail, forcing an unnecessary full relaunch. The fallback should only kill the process we actually launched (i.e., only when REMOTE_PID is non-empty).

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c4be90c. Configure here.

@juliusmarminge juliusmarminge merged commit 932df4e into main May 8, 2026
13 checks passed
@juliusmarminge juliusmarminge deleted the t3code/11970999 branch May 8, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL 500-999 changed lines (additions + deletions). vouch:trusted PR author is trusted by repo permissions or the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Adding Ubuntu/WSL2 SSH environment fails with "sh: 43: node: not found"

1 participant