Skip to content

Fix #89: bump DB pool and add auth-path in-process caches#90

Merged
max-tet merged 3 commits into
mainfrom
clayde/issue-89
May 22, 2026
Merged

Fix #89: bump DB pool and add auth-path in-process caches#90
max-tet merged 3 commits into
mainfrom
clayde/issue-89

Conversation

@ClaydeCode
Copy link
Copy Markdown
Contributor

Summary

Closes #89

Parallel request bursts (e.g. Actual Budget loading ~30 SQLite migration files simultaneously) exhaust the 4-connection default pool in 2–3 in-flight forwardAuth calls. Remaining calls block for 30 s then 500. This PR applies all four fixes described in the issue:

  • Pool size: AsyncConnectionPool(max_size=20, timeout=10) — headroom for realistic bursts; timeout=10 fails fast instead of stacking 30 s waits.
  • Identity cache: _get_identity() in auth.py caches the default SafeIdentity in-process. Invalidated via a new on_identity_update signal fired from every identity-mutation path (identity.make_default, identity.enrich_identity_from_profile, PUT /protected/identities).
  • App lookup cache: _find_app() caches the InstalledApp by name. Invalidated on the existing on_apps_update signal (already fires on install, uninstall, start, stop, shutdown).
  • ensure_app_is_running DB read removed: docker_start_app now accepts an optional status param. ensure_app_is_running passes the cached app's status, skipping the extra get_by_name call.

Steady-state result: /internal/auth makes zero DB queries for the common path.

Recommended reading order

  1. shard_core/database/connection.py — pool parameters
  2. shard_core/util/signals.py — new on_identity_update signal
  3. shard_core/service/identity.py + shard_core/web/protected/identities.py — signal fire sites
  4. shard_core/web/internal/auth.py — caches + invalidation handlers
  5. shard_core/service/app_tools.py — optional status param on docker_start_app
  6. shard_core/service/app_lifecycle.py — pass cached status

Test plan

  • Hard-refresh Actual Budget (or any private-access app) after clearing browser cache — all ~30 parallel asset requests should return 200 without any 500s
  • Install/uninstall an app, verify subsequent auth requests still resolve correctly (cache invalidated)
  • Update default identity via PUT /protected/identities, verify headers still reflect new identity on next auth request
  • GET /internal/auth for unknown app still returns 404
  • Existing test suite passes (requires Docker+Postgres environment)

🤖 Generated with Claude Code

- shard_core/database/connection.py: max_size=20, timeout=10 — prevents pool
  exhaustion when Traefik forwardAuth receives request bursts
- shard_core/web/internal/auth.py: cache _get_identity and _find_app results
  in-process; invalidate on on_identity_update / on_apps_update signals
- shard_core/util/signals.py: add on_identity_update signal
- shard_core/service/identity.py + web/protected/identities.py: fire
  on_identity_update after any identity mutation
- shard_core/service/app_tools.py: docker_start_app accepts optional status to
  skip the per-call DB read when status is already known
- shard_core/service/app_lifecycle.py: pass cached app.status to docker_start_app

Steady-state /internal/auth now makes zero DB queries for the common path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two test failures from the cache layer:

- test_auth.py::test_headers failed because the module-level _identity_cache
  survives across tests. init_default_identity creates a new identity per
  test (fresh DB) but never fired on_identity_update, so the cache held the
  previous test's identity. Fire the signal after creating the initial
  default identity.

- test_app_last_access.py::test_app_last_access_is_debounced failed because
  the cached InstalledApp's last_access field is never refreshed. The
  debounce check in update_last_access reads app.last_access from the cached
  object, which stays at the value captured when the cache was populated, so
  the debounce window never short-circuits. Mutate app.last_access = now
  after the DB update so the cached object reflects reality.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread shard_core/util/signals.py
Comment thread shard_core/service/app_lifecycle.py Outdated
…status param

- Move on_identity_update firing to service layer exclusively: add
  update_identity() to service/identity.py; web/protected/identities.py
  now calls the service function instead of firing the signal directly.
  The DB layer stays dependency-free; the signal has a single fire-site.

- Revert docker_start_app status parameter: the function always reads
  fresh status from DB (throttled to once per 5 s), eliminating any
  stale-cache concern. The throttle is still required to prevent
  excessive docker-compose subprocess calls from burst auth traffic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ClaydeCode
Copy link
Copy Markdown
Contributor Author

Pushed fixes for both review comments.

Signal layer (signals.py:9)

You're right that having the signal fire from two layers (service and web) was inconsistent. I've added update_identity(id, update_data) to service/identity.py — it does the DB write and fires on_identity_update. The web handler (identities.py:put_identity) now calls this instead of touching the DB directly. enrich_identity_from_profile likewise uses update_identity now. Every on_identity_update fire is in the service layer; the DB layer stays signal-free.

Stale cache / throttle (app_lifecycle.py:33)

Reverted the status parameter from docker_start_app. The function always reads fresh status from DB, so there is no "cache says RUNNING but container is down" discrepancy — both the cache and DB would agree, since they're only out of sync during external Docker events, and in those cases docker_start_app still calls docker-compose up -d (RUNNING is intentionally in the allowed-start list to handle crashed containers).

The throttle is still needed and I've kept it: it prevents docker-compose from being spawned on every auth request even though the DB read is now the only per-5s cost. Without it, 30 parallel migration fetches would spawn 30 concurrent docker-compose up -d subprocesses. The cache's job is to eliminate the two _find_app/_get_identity DB connections per auth call; docker_start_app's single throttled read is an acceptable residual.

@max-tet max-tet self-requested a review May 22, 2026 19:29
@max-tet max-tet merged commit a816e03 into main May 22, 2026
6 checks passed
@max-tet max-tet deleted the clayde/issue-89 branch May 22, 2026 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DB connection pool too small — forwardAuth bursts cause batch 500s

2 participants