Skip to content

feat(auth): Backfill accountAuthorizations with refresh tokens#20513

Draft
nshirley wants to merge 1 commit intomainfrom
worktree-FXA-12932
Draft

feat(auth): Backfill accountAuthorizations with refresh tokens#20513
nshirley wants to merge 1 commit intomainfrom
worktree-FXA-12932

Conversation

@nshirley
Copy link
Copy Markdown
Contributor

@nshirley nshirley commented May 4, 2026

Because:

  • We want to be able to backfill refreshTokens into the
    accountAuthorizations table

This commit:

  • Adds a backfill script to walk the refreshTokens table, inserting a
    new accountAuthoriztions for the most recent token/scope/service
    combo
  • Adds unit tests for the backfill script

Closes: FXA-12932

Checklist

Put an x in the boxes that apply

  • My commit is GPG signed.
  • If applicable, I have modified or added tests which pass locally.
  • I have added necessary documentation (if appropriate).
  • I have verified that my changes render correctly in RTL (if appropriate).
  • I have manually reviewed all AI generated code.

How to review (Optional)

  • Key files/areas to focus on:
  • Suggested review order:
  • Risky or complex parts:

Screenshots (Optional)

Please attach the screenshots of the changes made in case of change in user interface.

Other information (Optional)

Any other information that is important to this pull request.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this, it's just a way to keep track of the various test cases from seed data locally. Might be worth keeping though 🤷

* Tokens are SHA256 hashes so they distribute uniformly, making byte-boundary
* splits an effective way to parallelize without hotspots.
*/
function workerCursors(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not need the multiple worker functionality with this. It's nice to have, but might require more setup within the k8 cluster - whoever reviews this, let me know your thoughts!

);
}

function writeCheckpoint(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice to have, but if the container is shutdown then the file is lost. A better option would be storing it in mysql, but it's not super necessary.

Luckily, the inserts are idempotent, so there's no risk in start at the beginning again if the script crashes. It just means we have to scan the table again

);
}

async function batchUpsert(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably use the work from #20500 to use the functions there for upsert. No sense in reinventing the wheel, this was just place holder while working on the backfill

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, circling back to this, it's probably not necessary to use the underlying db/mysql functions added as part of the work to create the table.

Few reasons for this:

  • The db layer is responsible for app behavior, which has an interesting caveat that it uses an INSERT IGNORE such that the first authz grant becomes the 'oldest'. This is correct as far as the table / business logic goes, but doesn't help us with the backfill. When backfilling, we may find a token for a user who granted access in 2025, then later in processing we find a token granted (same scopes, etc.) for 2024. We want the oldest to be inserted, hence why we use ON DUPLICATE KEY UPDATE
  • The other problem: the db layer does a single insert, and we're wanting to backfill millions of records so we do bulk updates with INSERT ... VALUES (), (). If we switch to using the single insert we significantly increase run time
  • Lastly, we could create a function in mysql db layer like _bulkUpsertAccountAuthorizationsForBackfill but this then adds a backfill only function next to 'production' code and we probably don't want that. (though, maybe there's an argument for making this for RP's to be able to quickly de-authorize multiple users?)

@nshirley nshirley changed the title Worktree fxa 12932 feat(auth): Backfill accountAuthorizations with refresh tokens May 4, 2026
@nshirley nshirley force-pushed the worktree-FXA-12932 branch from dc59c8a to 9f1efe8 Compare May 5, 2026 15:03
Because:
 - We want to be able to backfill refreshTokens into the
   accountAuthorizations table

This commit:
 - Adds a backfill script to walk the refreshTokens table, inserting a
   new accountAuthoriztions for the most recent token/scope/service
combo
 - Adds unit tests for the backfill script

Closes: FXA-12932
@nshirley nshirley force-pushed the worktree-FXA-12932 branch from 9f1efe8 to c8a9234 Compare May 5, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant