Skip to content

OUT-3761: should_retry column for terminal sync log failures#255

Merged
SandipBajracharya merged 3 commits into
masterfrom
OUT-3761
May 21, 2026
Merged

OUT-3761: should_retry column for terminal sync log failures#255
SandipBajracharya merged 3 commits into
masterfrom
OUT-3761

Conversation

@SandipBajracharya
Copy link
Copy Markdown
Collaborator

Summary

  • Replaces the deletedAt = now() hack on ACCOUNT-category failures with an explicit should_retry boolean. The hack broke uq_qb_sync_logs_oneshot_active (filtered WHERE deleted_at IS NULL) and let every webhook retry re-claim, producing duplicate FAILED rows + duplicate downstream notifications.
  • Terminal failures now keep deletedAt = null so the partial unique index keeps deduping. AUTH stays terminal (shouldRetry = false, revived on OAuth reconnect via includeNoRetry: true). ACCOUNT stays retryable so the next cron sweep self-heals once the customer renews their QBO subscription (MAX_ATTEMPTS = 25 bounds the waste).
  • getAllFailedLogsForWorkspace's includeDeleted parameter is replaced with includeNoRetry; soft-deleted rows are now permanently excluded from the retry sweep.

Ticket

https://linear.app/assemblycom/issue/OUT-3761/duplicate-quickbooks-notification-are-dispatched-for-sync-failed

Test plan

  • Unit: getShouldRetryForCategory (9 cases covering all FailedRecordCategoryType branches + the undefined path).
  • Integration: ACCOUNT failure (accountFailureClaimDedup.test.ts) — one FAILED row with shouldRetry = true, deletedAt = null, repeat webhook delivery returns claimed: false.
  • Integration: AUTH failure (authFailureClaimDedup.test.ts) — one FAILED row with shouldRetry = false, deletedAt = null, repeat webhook delivery returns claimed: false.
  • Integration: getAllFailedLogsForWorkspace(includeNoRetry) (retrySweepIncludeNoRetry.test.ts) — default skips terminal rows; true includes them.
  • Full suite: yarn test — 36 files / 269 tests passing.
  • Lint + prettier + tsc on touched files: clean.

🤖 Generated with Claude Code

…ailures

ACCOUNT/AUTH webhook failures used to set deletedAt=now() on the live claim
row, which pulled it out of uq_qb_sync_logs_oneshot_active (WHERE deleted_at
IS NULL) and let subsequent webhook deliveries re-claim, re-fail, and
duplicate FAILED rows + downstream notifications.

Replace the deletedAt-at-insert hack with an explicit should_retry boolean
column. Terminal failures keep deletedAt=null so the partial unique index
still dedupes repeat claims; the cron retry sweep filters shouldRetry=true
to skip terminal categories. AUTH is terminal (revived only on OAuth
reconnect via includeNoRetry); ACCOUNT stays retryable so the next cron
sweep self-heals once the customer renews their QBO subscription.

Also replaces getAllFailedLogsForWorkspace's includeDeleted parameter with
includeNoRetry; soft-deleted rows are now permanently excluded from the
retry sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 21, 2026

OUT-3761

@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quickbooks-sync Building Building May 21, 2026 8:58am
quickbooks-sync (dev) Ready Ready Preview, Comment May 21, 2026 8:58am

Request Review

@SandipBajracharya SandipBajracharya changed the title feat(OUT-3761): should_retry column for terminal sync log failures OUT-3761: should_retry column for terminal sync log failures May 21, 2026
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 21, 2026

Greptile Summary

This PR replaces the deletedAt = now() hack on ACCOUNT-category failures with an explicit should_retry boolean column. The hack was breaking uq_qb_sync_logs_oneshot_active (filtered WHERE deleted_at IS NULL), allowing every webhook retry to re-claim and produce duplicate FAILED rows and duplicate downstream notifications.

  • Migration: Adds should_retry boolean DEFAULT true NOT NULL to qb_sync_logs; backfills existing rows as retryable without touching deleted_at.
  • Retry semantics: AUTH failures are now terminal (shouldRetry=false, revived on OAuth reconnect via includeNoRetry: true); ACCOUNT failures remain retryable (shouldRetry=true) so the cron sweep self-heals once a QBO subscription is restored; MAX_ATTEMPTS=25 bounds the retry waste for all categories.
  • getAllFailedLogsForWorkspace: isNull(deletedAt) is now unconditional (soft-deleted rows permanently excluded from the retry sweep); includeNoRetry: true is used only on the OAuth reconnect path to pull in terminal AUTH rows.

Confidence Score: 4/5

Safe to merge; fixes a real deduplication bug without breaking existing retry or notification flows

The core fix is sound: keeping deleted_at = null on all failures preserves the partial unique index slot and eliminates the duplicate-row/duplicate-notification problem. The migration default of true cleanly backfills existing rows. The only concern is that VALIDATION errors (bad payload data that typically won't self-heal) are treated identically to transient failures like RATE_LIMIT, burning up to 25 retry attempts before giving up.

src/utils/synclog.ts — the getShouldRetryForCategory function; the VALIDATION branch deserves a second look to confirm the intent of retrying bad-payload failures up to MAX_ATTEMPTS times.

Important Files Changed

Filename Overview
src/utils/synclog.ts Replaces getDeletedAtForAuthAccountCategoryLog with getShouldRetryForCategory; only AUTH returns false — VALIDATION and other non-self-healing categories are also marked retryable (up to MAX_ATTEMPTS)
src/app/api/quickbooks/syncLog/syncLog.service.ts Replaces includeDeleted with includeNoRetry; isNull(deletedAt) now unconditional (soft-deleted rows permanently excluded); shouldRetry=true filter added for default sweep path
src/app/api/quickbooks/webhook/webhook.service.ts Replaces deletedAt hack with shouldRetry at all failure sites; also moves validateAccessToken earlier in the payment-succeeded handler as an incidental ordering fix
src/app/api/quickbooks/auth/auth.service.ts Passes includeNoRetry: true on OAuth reconnect so terminal AUTH rows (shouldRetry=false) are swept into the reconnect retry
src/db/migrations/20260520113723_add_should_retry_to_qb_sync_logs.sql Adds should_retry boolean DEFAULT true NOT NULL; default correctly backfills existing rows as retryable
src/db/schema/qbSyncLogs.ts Adds shouldRetry: t.boolean('should_retry').default(true).notNull() to the Drizzle table definition; included in auto-generated insert/update schemas
src/app/api/quickbooks/sync/sync.service.ts Parameter renamed to includeNoRetry, passed through to getAllFailedLogsForWorkspace; failure-update site switches from deletedAt to shouldRetry
src/app/api/quickbooks/payment/payment.service.ts Replaces deletedAt with shouldRetry in the failure-log payload; private helper type updated accordingly

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Webhook arrives] --> B[claimWebhookEvent\nINSERT ON CONFLICT DO NOTHING]
    B -->|claimed=false\nrow already exists| C[Return 200 — deduplicated]
    B -->|claimed=true\nPENDING row inserted| D[Execute sync handler]
    D -->|success| E[Update row to SUCCESS\ndeletedAt=null]
    D -->|error| F[getShouldRetryForCategory]
    F -->|AUTH category\nrefresh token dead| G[shouldRetry=false\ndeletedAt=null\nUnique index still holds slot]
    F -->|ACCOUNT / RATE_LIMIT / OTHERS / etc.| H[shouldRetry=true\ndeletedAt=null\nUnique index still holds slot]
    G --> I[Normal cron sweep\nexcludes AUTH row]
    H --> J[Normal cron sweep\nretries row up to MAX_ATTEMPTS=25]
    I --> K[OAuth reconnect\nsyncFailedRecords includeNoRetry=true\nIncludes AUTH rows]
    K --> D
Loading

Reviews (1): Last reviewed commit: "feat(OUT-3761): add should_retry column ..." | Re-trigger Greptile

Comment thread src/utils/synclog.ts
Comment thread src/app/api/quickbooks/webhook/webhook.service.ts
ACCOUNT failures are now retryable (kept in qb_sync_logs with
shouldRetry=true); with MAX_ATTEMPTS=25 a 3h cadence burned the attempt
counter in ~75h (~3 days), often before a customer renewed their QBO
subscription. 12h × 25 = ~300h (~12.5 days) of retry headroom, which
matches the realistic recovery window without raising MAX_ATTEMPTS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@priosshrsth priosshrsth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SandipBajracharya Minor nitpick reagrding parameter name. Otherwise the PR looks good to me.

Comment thread src/app/api/quickbooks/syncLog/syncLog.service.ts Outdated
Comment thread src/app/api/quickbooks/syncLog/syncLog.service.ts
Comment thread src/app/api/quickbooks/auth/auth.service.ts
@SandipBajracharya SandipBajracharya merged commit c8be984 into master May 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants