feat(auth): Backfill accountAuthorizations with refresh tokens#20513
feat(auth): Backfill accountAuthorizations with refresh tokens#20513
Conversation
There was a problem hiding this comment.
I'll remove this, it's just a way to keep track of the various test cases from seed data locally. Might be worth keeping though 🤷
| * Tokens are SHA256 hashes so they distribute uniformly, making byte-boundary | ||
| * splits an effective way to parallelize without hotspots. | ||
| */ | ||
| function workerCursors( |
There was a problem hiding this comment.
We might not need the multiple worker functionality with this. It's nice to have, but might require more setup within the k8 cluster - whoever reviews this, let me know your thoughts!
| ); | ||
| } | ||
|
|
||
| function writeCheckpoint( |
There was a problem hiding this comment.
This is a nice to have, but if the container is shutdown then the file is lost. A better option would be storing it in mysql, but it's not super necessary.
Luckily, the inserts are idempotent, so there's no risk in start at the beginning again if the script crashes. It just means we have to scan the table again
| ); | ||
| } | ||
|
|
||
| async function batchUpsert( |
There was a problem hiding this comment.
I can probably use the work from #20500 to use the functions there for upsert. No sense in reinventing the wheel, this was just place holder while working on the backfill
There was a problem hiding this comment.
So, circling back to this, it's probably not necessary to use the underlying db/mysql functions added as part of the work to create the table.
Few reasons for this:
- The db layer is responsible for app behavior, which has an interesting caveat that it uses an
INSERT IGNOREsuch that the first authz grant becomes the 'oldest'. This is correct as far as the table / business logic goes, but doesn't help us with the backfill. When backfilling, we may find a token for a user who granted access in 2025, then later in processing we find a token granted (same scopes, etc.) for 2024. We want the oldest to be inserted, hence why we useON DUPLICATE KEY UPDATE - The other problem: the db layer does a single insert, and we're wanting to backfill millions of records so we do bulk updates with
INSERT ... VALUES (), (). If we switch to using the single insert we significantly increase run time - Lastly, we could create a function in mysql db layer like
_bulkUpsertAccountAuthorizationsForBackfillbut this then adds a backfill only function next to 'production' code and we probably don't want that. (though, maybe there's an argument for making this for RP's to be able to quickly de-authorize multiple users?)
dc59c8a to
9f1efe8
Compare
Because: - We want to be able to backfill refreshTokens into the accountAuthorizations table This commit: - Adds a backfill script to walk the refreshTokens table, inserting a new accountAuthoriztions for the most recent token/scope/service combo - Adds unit tests for the backfill script Closes: FXA-12932
9f1efe8 to
c8a9234
Compare
Because:
accountAuthorizations table
This commit:
new accountAuthoriztions for the most recent token/scope/service
combo
Closes: FXA-12932
Checklist
Put an
xin the boxes that applyHow to review (Optional)
Screenshots (Optional)
Please attach the screenshots of the changes made in case of change in user interface.
Other information (Optional)
Any other information that is important to this pull request.