Skip to content

fix(gemini): graceful fallback for stale Gemini session IDs on resume#812

Open
bourgois wants to merge 4 commits into
siteboon:mainfrom
bourgois:fix/gemini-stale-session-resume
Open

fix(gemini): graceful fallback for stale Gemini session IDs on resume#812
bourgois wants to merge 4 commits into
siteboon:mainfrom
bourgois:fix/gemini-stale-session-resume

Conversation

@bourgois
Copy link
Copy Markdown

@bourgois bourgois commented May 30, 2026

Summary

Fixes the upstream defect where Gemini CLI assigns a new internal session ID on each run, causing all subsequent resume attempts to fail with exit code 42 and "Invalid session identifier".

Chat path (server/gemini-cli.js)

  • Always refresh cliSessionId in the onInit handler instead of only writing it once, so the stored value stays current after every run
  • On exit code 42 + "Invalid session identifier" in stderr: clear the stale cliSessionId, suppress the buffered error messages, and retry once with --resume latest
  • Suppress the complete WebSocket event before the retry so the client doesn't receive a spurious exitCode 42 completion followed by a second stream
  • Buffer "invalid session identifier" stderr lines instead of forwarding immediately — discarded silently on a successful retry, flushed to the client only if the retry also fails

Terminal/PTY path (server/modules/websocket/services/shell-websocket.service.ts)

  • The shell path built gemini --resume "<id>" with no fallback, leaving the terminal stuck on a stale ID
  • Add a three-level fallback: --resume <id>--resume latest → fresh session, with a PowerShell-compatible variant on Windows (mirroring how the Claude provider already handles this)

Test plan

  • Start a Gemini session, send a message, close and reopen — chat should resume without any visible error
  • Open the same session in the terminal tab — should recover via --resume latest fallback instead of dying with exit 42
  • Verify that after a successful resume the stored cliSessionId is updated (future restarts should also work)
  • Confirm a genuine exit-code-42 (non-session reason) still surfaces an error to the UI and does not trigger a retry loop
  • Verify on Windows that the PowerShell fallback chain behaves correctly

Summary by CodeRabbit

  • Bug Fixes
    • Improved session resilience: CLI now detects invalid/stale sessions and automatically retries resuming to avoid repeated failures.
    • Single automatic stale-resume retry introduced to recover from transient resume errors.
    • Added fallback resume behavior: if resuming a stored session fails, the system falls back to resuming the latest session, then starts fresh.
    • More reliable session tracking: stored session identifiers are refreshed on init to reduce future resume errors.
    • Reduced noisy error emissions during automatic retry attempts.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6477bee1-ea80-4c99-bf9a-c0b38eaef5e2

📥 Commits

Reviewing files that changed from the base of the PR and between a1226d2 and 392aeb3.

📒 Files selected for processing (2)
  • server/gemini-cli.js
  • server/modules/websocket/services/shell-websocket.service.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/modules/websocket/services/shell-websocket.service.ts
  • server/gemini-cli.js

📝 Walkthrough

Walkthrough

Detect/stash "invalid session identifier" stderr, refresh stored cliSessionId from init events, add _resumeLatest mode to spawn calls, suppress completion while deciding retry, and implement a one-time retry on exit code 42 that clears the stale id and retries using --resume latest. Provider command now falls back to --resume latest.

Changes

Stale Resume Session Handling

Layer / File(s) Summary
Stale session detection infrastructure
server/gemini-cli.js
Adds stderrHasInvalidSession internal flag, buffers invalid-session stderr, and defers flushing until retry decision; adjusts process-close behavior to suppress complete when a retry will occur.
Resume argument and provider fallback
server/gemini-cli.js, server/modules/websocket/services/shell-websocket.service.ts
Resume argument construction supports _resumeLatest (--resume latest) and provider command now tries --resume <resumeId> then falls back to --resume latest (PowerShell exit-code checks on Windows, `
Session ID refresh and automatic retry
server/gemini-cli.js
Expanded JSDoc for spawnGemini; onInit always overwrites stored cliSessionId with canonical event.session_id and persists it. On exit code 42 with prior invalid-session stderr (and not already retried), clears stored cliSessionId, saves session, and reinvokes spawnGemini once with _retried:true and _resumeLatest:true.
sequenceDiagram
  participant Client
  participant spawnGemini
  participant sessionManager
  participant GeminiCLI
  Client->>spawnGemini: start request (may include sessionId)
  spawnGemini->>sessionManager: read stored cliSessionId / resumeId
  spawnGemini->>GeminiCLI: spawn with --resume <cliSessionId> or --resume latest
  GeminiCLI-->>spawnGemini: stderr "invalid session identifier"
  spawnGemini->>spawnGemini: set stderrHasInvalidSession and buffer stderr
  alt exit code 42 and stderrHasInvalidSession and not _retried
    spawnGemini->>sessionManager: clear stored cliSessionId
    spawnGemini->>spawnGemini: retry with _retried:true and _resumeLatest:true
    spawnGemini->>GeminiCLI: spawn with --resume latest
    GeminiCLI-->>spawnGemini: successful resume
    spawnGemini-->>Client: complete
  else no retry
    spawnGemini-->>Client: flush buffered stderr and complete
  end
Loading

Suggested reviewers

  • viper151

🐰 I sniffed the stale old thread,
Found IDs that made the CLI dread,
Cleared the stale, tried latest next,
A hop, a retry—session fixed!
Fresh resumes bloom from a bunny's tread.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding graceful fallback handling for stale Gemini session IDs during resume operations, which is the core objective addressed by modifications to both gemini-cli.js and shell-websocket.service.ts.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
server/gemini-cli.js (1)

473-478: 💤 Low value

Consider suppressing the error message when retry is planned.

The "invalid session identifier" error is sent to the client (line 478) before the retry occurs. This could confuse users who see an error followed by a successful stream. If the intent is a seamless retry, consider gating the ws.send on line 478 similarly to how the 'complete' event is suppressed.

♻️ Optional: suppress error message before retry
             if (errorMsg.toLowerCase().includes('invalid session identifier')) {
                 stderrHasInvalidSession = true;
+                // Don't surface this error to the client; we'll retry with --resume latest
+                return;
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/gemini-cli.js` around lines 473 - 478, The code sends an error message
for "invalid session identifier" immediately via
ws.send(createNormalizedMessage(...)) even when a retry is planned; update the
logic around the error handling so that when stderrHasInvalidSession is set
(i.e., when errorMsg.toLowerCase().includes('invalid session identifier')), you
suppress or skip the ws.send call for that specific error path (use the same
gating logic used for the 'complete' suppression) and only send an error to the
client if no retry will occur; check getSessionId()/capturedSessionId/sessionId
to build the socketSessionId as before but early-return or skip ws.send when a
retry is intended.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/gemini-cli.js`:
- Around line 473-478: The code sends an error message for "invalid session
identifier" immediately via ws.send(createNormalizedMessage(...)) even when a
retry is planned; update the logic around the error handling so that when
stderrHasInvalidSession is set (i.e., when
errorMsg.toLowerCase().includes('invalid session identifier')), you suppress or
skip the ws.send call for that specific error path (use the same gating logic
used for the 'complete' suppression) and only send an error to the client if no
retry will occur; check getSessionId()/capturedSessionId/sessionId to build the
socketSessionId as before but early-return or skip ws.send when a retry is
intended.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6f73275e-ca14-4504-8218-fa11ad19afe1

📥 Commits

Reviewing files that changed from the base of the PR and between 38bf21d and 0d0955e.

📒 Files selected for processing (1)
  • server/gemini-cli.js

bourgois added 4 commits May 30, 2026 16:09
Gemini CLI assigns a new internal session ID on each run, causing the
stored cliSessionId to go stale. Subsequent --resume <id> calls fail
with exit code 42 and "Invalid session identifier" in stderr.

Three changes:
- onInit now always refreshes cliSessionId instead of only writing it
  once, so the stored value stays current after every run
- When exit code 42 + "Invalid session identifier" detected and we
  were attempting a resume, clear the stale cliSessionId and retry
  once with --resume latest (Gemini's built-in newest-file shortcut)
- Suppress the 'complete' WebSocket event before the retry so the
  client does not receive a spurious exitCode 42 completion followed
  by a second stream from the successful retry
…al resume

The shell/PTY path built `gemini --resume "<id>"` with no fallback, so a
stale cliSessionId caused exit 42 and left the terminal stuck. Add a
bash-OR chain (and PowerShell equivalent on Windows) that falls through to
`--resume latest` then a fresh session, mirroring how the Claude provider
already handles this.
…cover it

Buffer 'invalid session identifier' stderr messages instead of forwarding
them to the client immediately. On a successful retry the buffer is
discarded silently; only if the retry itself fails (or no retry is
possible) are the messages flushed so the client sees the real error.
@bourgois bourgois force-pushed the fix/gemini-stale-session-resume branch from a1226d2 to 392aeb3 Compare May 30, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant