Skip to content

fix(client): harden HTTP retry and add 401-triggered token refresh#151

Draft
tim-thacker-nullify wants to merge 1 commit into
mainfrom
fix/http-client-reliability
Draft

fix(client): harden HTTP retry and add 401-triggered token refresh#151
tim-thacker-nullify wants to merge 1 commit into
mainfrom
fix/http-client-reliability

Conversation

@tim-thacker-nullify
Copy link
Copy Markdown
Member

Claude

Why

The HTTP plumbing the CLI and the long-running MCP server depend on had reliability gaps and zero test coverage.

What

retry.go

  • Method-aware retries. 5xx is now retried only for idempotent methods (GET/HEAD/PUT/DELETE/OPTIONS/TRACE). A POST/PATCH that may have committed server-side before erroring is no longer blindly replayed — previously a duplicate-scan / duplicate-autofix hazard. 429 is still retried for any method (the server rejected it without processing).
  • Honor Retry-After (seconds or HTTP-date), capped at maxDelay, instead of only client-side exponential backoff.
  • Drain the response body before retrying so the connection can be reused; switch to math/rand/v2.

refreshing_transport.go

  • React to 401. The cached token can go invalid before its 5-minute TTL (revocation, server session kill, clock skew); previously every request kept sending the dead token for up to 5 minutes. Now a 401 forces a refresh and retries once. The forced refresh is storm-safe (concurrent 401s collapse to a single refresh) and the request body is buffered so it replays correctly.

Tests — first coverage for internal/client: retry status/method gating, Retry-After parsing + cap, body replay, context cancellation, and the 401 refresh-and-retry / storm-safety / TTL paths.

Test plan

  • go build ./... ✅, go vet ./... ✅, go test ./... ✅ (13 new tests in internal/client).

🤖 Generated with Claude Code

The HTTP plumbing the CLI (and long-running MCP server) depends on had two
reliability gaps and no test coverage.

retry.go:
- Make retries method-aware. 5xx is now retried only for idempotent methods
  (GET/HEAD/PUT/DELETE/OPTIONS/TRACE); a POST/PATCH that may have committed
  server-side before erroring is no longer blindly replayed (was a
  duplicate-scan / duplicate-autofix hazard). 429 is still retried for any
  method since the server rejected it without processing.
- Honor the server's Retry-After header (seconds or HTTP-date), capped at
  maxDelay, instead of only client-side exponential backoff.
- Drain the response body before retrying so the connection can be reused.
- Switch to math/rand/v2.

refreshing_transport.go:
- React to 401. The cached token can become invalid before its 5-minute TTL
  (revocation, server session kill, clock skew); previously every request kept
  sending the dead token for up to 5 minutes. Now a 401 forces a token refresh
  and retries the request once. The forced refresh is storm-safe: concurrent
  401s collapse to a single refresh, and the request body is buffered so it
  replays correctly.

Adds the first tests for internal/client: retry status/method gating,
Retry-After parsing + cap, body replay, context cancellation, and the 401
refresh-and-retry / storm-safety / TTL paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tim-thacker-nullify tim-thacker-nullify added the patch Patch version updates (fixes) label May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Patch version updates (fixes)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant