Skip to content

fix(signin): run /oauth/authorization before firing webchannel events#20543

Draft
dschom wants to merge 1 commit intomainfrom
worktree-fix-flaky-vpn-test
Draft

fix(signin): run /oauth/authorization before firing webchannel events#20543
dschom wants to merge 1 commit intomainfrom
worktree-fix-flaky-vpn-test

Conversation

@dschom
Copy link
Copy Markdown
Contributor

@dschom dschom commented May 8, 2026

Because

  • The vpn integration › authorization flow - user already signed into Firefox Playwright test was flaking with a 401 on the second /v1/oauth/authorization call, blocking CI.
  • Confirmed via the failing CI trace's response body — the auth-server returned errno 110 "Invalid authentication token: Missing authentication" from tokenNotFoundError() in packages/fxa-auth-server/lib/routes/auth-schemes/hawk-fxa-token.js. The session was valid for /v1/account/profile, /v1/session/status, and /v1/oauth/token immediately before, but couldn't be found 15 ms later for /v1/oauth/authorization — same Hawk id, no /v1/session/destroy in between.
  • Root cause: firefox.fxaLogin (and fxaOAuthLogin) dispatch WebChannelMessageToChrome synchronously, but the browser handles those events on its own event loop. The browser's fxaccounts:login handler can mutate session state, racing with an in-flight /v1/oauth/authorization HTTP request. The vpn integration test reproduced it most reliably because it does two consecutive OAuth flows in the same browser context (Sync, then VPN cached signin).

This pull request

  • Reorders handleNavigation in packages/fxa-settings/src/pages/Signin/utils.ts so getOAuthNavigationTarget (the /v1/oauth/authorization HTTP call) runs first, obtaining the OAuth code while the session is still untouched. firefox.fxaLogin and firefox.fxaOAuthLogin fire after the HTTP response returns. The Login → OAuthLogin order is preserved (still required for Desktop OAuth per FXA-10388).
  • Adds a v1-account regression test in packages/functional-tests/tests/misc/vpnIntegration.spec.ts that builds an auth client with target.createAuthClient(1) and runs the same Sync→VPN flow, exercising the v1→v2 upgrade path on first sign-in. This is the test pattern the original flake hit.
  • Companion PR chore(functional-tests): default test accounts to v2 key stretching #20556 is independent test-infrastructure work (defaulting testAccountTracker accounts to v2 key stretching). Either PR can land first.

Issue that this pull request solves

Closes: FXA-13687

Checklist

Put an x in the boxes that apply

  • My commit is GPG signed.
  • If applicable, I have modified or added tests which pass locally.
  • I have added necessary documentation (if appropriate).
  • I have verified that my changes render correctly in RTL (if appropriate).
  • I have manually reviewed all AI generated code.

How to review (Optional)

  • Key files/areas to focus on:
    • packages/fxa-settings/src/pages/Signin/utils.ts — the reorder is small but load-bearing; the new comment block above the await getOAuthNavigationTarget(...) call documents why the order matters so a future reader doesn't undo it.
    • packages/functional-tests/tests/misc/vpnIntegration.spec.ts — the new regression test mirrors the existing test but explicitly forces a v1 client so it stays as a regression signal even after chore(functional-tests): default test accounts to v2 key stretching #20556's v2 default lands.
  • Suggested review order: utils.ts first (the actual fix), then the new test.
  • Risky or complex parts: handleNavigation is shared by every signin path (Sync, Relay, SmartWindow, VPN, OAuthWeb, plain OAuth). The reorder doesn't change WHICH events fire — only their relative timing vs the HTTP call. Existing unit tests (packages/fxa-settings/src/pages/Signin/utils.test.ts) assert that fxaLogin and fxaOAuthLogin are called with the right args; they don't assert call order, so they remain valid.

Other information (Optional)

The flake was originally suspected to be caused by the v1→v2 key-stretching upgrade running during the first sign-in (since password/change/finish updates accounts.verifierSetAt). PR #20556 was opened on that hypothesis but didn't fix CI on its own — the failure pattern was identical with and without the upgrade flow. Capturing the actual 401 response body from the failing trace narrowed the cause down to a webchannel race rather than anything verifierSetAt-related.

@dschom dschom requested a review from a team as a code owner May 8, 2026 00:01
@dschom dschom marked this pull request as draft May 8, 2026 01:11
Because:
* In handleNavigation's verified-OAuth path, sendFxaLogin (the
  fxaccounts:login webchannel event) was dispatched synchronously
  BEFORE awaiting getOAuthNavigationTarget (the /v1/oauth/authorization
  HTTP call). The browser's async handler for that webchannel message
  can mutate session state, racing with the in-flight HTTP request.
* This produced intermittent 401 "Token not found" failures on the
  HTTP call — most reproducibly when the same browser context goes
  through two consecutive OAuth flows (e.g. the vpn integration test
  signing into Sync then VPN). The first /oauth/authorization
  succeeded; the second one would 401 even though the session
  remained valid for /account/profile, /session/status, and
  /oauth/token in between.

This commit:
* Reorders handleNavigation so getOAuthNavigationTarget runs first,
  obtaining the OAuth code while the session is still untouched, and
  THEN fires fxaLogin and fxaOAuthLogin. The Login → OAuthLogin order
  is preserved (still required for Desktop OAuth per FXA-10388).
* Adds a v1-account regression test alongside the existing vpn
  integration test, using target.createAuthClient(1) so the v1→v2
  upgrade path is exercised on the first sign-in even after
  testAccountTracker defaults to v2 (separate PR).

closes FXA-13687

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dschom dschom force-pushed the worktree-fix-flaky-vpn-test branch from 204d9de to 76e1354 Compare May 9, 2026 00:13
@dschom dschom changed the title fix(functional-tests): default test accounts to v2 key stretching fix(signin): run /oauth/authorization before firing webchannel events May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant