Skip to content

feat(webapp): consolidate auth path + add comprehensive auth tests#3499

Open
matt-aitken wants to merge 86 commits intomainfrom
rbac-packages
Open

feat(webapp): consolidate auth path + add comprehensive auth tests#3499
matt-aitken wants to merge 86 commits intomainfrom
rbac-packages

Conversation

@matt-aitken
Copy link
Copy Markdown
Member

Summary

Consolidates the webapp's authentication and authorization into a small set of route helpers, replacing the ad-hoc requireUser / requireUserId / authenticatedEnvironmentForAuthentication calls scattered across routes. Same security model, but the per-request flow (authenticate → authorize → load) now lives in one place per route family.

Adds a comprehensive end-to-end auth test suite that didn't exist before — 162 tests covering API key, PAT and JWT auth across the public API surface, plus dashboard session auth for admin pages.

Changes

Dashboard auth (started, partial rollout)

Admin and settings pages migrated to a unified loader/action helper that authenticates the session, runs an authorization check, and exposes the result to the route. Other dashboard routes still on the old pattern; remaining migration tracked separately.

Migrated routes:

  • admin.* (14 admin / back-office / feature-flags / LLM-models / notifications / orgs / concurrency pages)
  • _app.orgs.$organizationSlug.settings.team
  • _app.orgs.$organizationSlug.settings.roles

API / realtime / engine auth (complete for the migrated families)

71 routes migrated to a unified apiBuilder that centralizes Bearer / PAT / Public-JWT authentication and applies the per-route authorization check before the handler runs. Includes:

  • api.v1.* and api.v2.* and api.v3.* — tasks, runs, batches, queues, prompts, deployments, query, sessions, waitpoints, packets, workers, idempotency keys
  • realtime.v1.* — runs, batches, sessions, streams
  • engine.v1.* — dev / worker-action protocols

Side effect: action aliases preserved historic JWT scope semantics where the new model is stricter (e.g. a write:tasks JWT now also satisfies trigger / batchTrigger / update actions on the same resource — matched at the auth boundary, not in the route handler).

Auth test suite (new — *.e2e.full.test.ts)

162 e2e tests run against a real spawned webapp + Postgres (no mocks). Coverage matrix:

  • API key auth — read / write / trigger / batchTrigger / deploy actions across runs, batches, deployments, prompts, queues, query, sessions, input-streams, waitpoints, tasks, idempotency keys; multi-key resources (a run carries batch / tag / task identifiers — auth must accept any matching scope)
  • Personal Access Token auth — comprehensive matrix: scope match, scope mismatch, missing scope, expired token, malformed token
  • Public JWT auth — sub-vs-URL environment resolution, expired JWTs, signature verification, scope checking, otu (one-time-use) token semantics, branch-environment signing-key fallback
  • Dashboard session auth — admin-only pages reject non-admins; per-action gating
  • Cross-cutting edge cases — revoked API key grace window, JWT cross-environment isolation, MissingResource branch behaviour

Test plan

  • pnpm run typecheck --filter webapp clean
  • pnpm exec vitest run --config apps/webapp/vitest.e2e.full.config.ts — 162/162 pass
  • Spot-check an authed API endpoint with a valid + invalid API key against a local stack
  • Spot-check the migrated admin pages render and gate non-admins

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 1, 2026

🦋 Changeset detected

Latest commit: d7055f2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 31 packages
Name Type
@trigger.dev/core Patch
@trigger.dev/plugins Patch
@trigger.dev/build Patch
trigger.dev Patch
@trigger.dev/python Patch
@trigger.dev/redis-worker Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/llm-model-catalog Patch
@trigger.dev/rbac Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
d3-chat Patch
references-d3-openai-agents Patch
references-nextjs-realtime Patch
references-realtime-hooks-test Patch
references-realtime-streams Patch
references-telemetry Patch
@internal/sdk-compat-tests Patch
@trigger.dev/react-hooks Patch
@trigger.dev/rsc Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new RBAC system and plugin surface: a @trigger.dev/rbac package, RBAC contracts in packages/plugins, a lazy plugin loader and fallback controller, and a server-side rbac service. Introduces dashboard/api route builder abstractions that centralize authentication/authorization, updates many API/dashboard routes to use typed { type, id } resources (removing prior superScopes), and adds everyResource/authorization aliasing. Webapp changes include role UI (organization Roles page, role picker, token role assignment), invite RBAC handling, schema/migration for OrgMemberInvite.rbacRoleId, project-created environment provisioning refactor, extensive e2e RBAC tests and shared test-server infrastructure, and a CI workflow for full E2E auth runs.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch rbac-packages

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

matt-aitken added a commit that referenced this pull request May 4, 2026
Five real-bug fixes from CodeRabbit + Devin review of #3499:

#1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string
   was 'RBAC fallback not installed' but the OSS fallback actually
   returns 'RBAC plugin not installed'. The mismatch made every PAT
   creation with a roleId hit the compensating-delete branch on
   self-hosters with no plugin installed — including the dashboard
   PAT-creation flow. Aligns the constant with the canonical string.

#2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped
   the revoked-API-key grace window (RevokedApiKey table), so a
   freshly-rotated env API key would 401 immediately on the new auth
   path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey
   logic so the auth-cross-cutting e2e tests pass.

#3 api.v1.query.ts: multi-table queries built a plain RbacResource
   array, which checkAuth treats as 'any element passes'. A JWT
   scoped to one detected table could submit a query that also reads
   another table it isn't scoped to. Wrap with everyResource — same
   AND-semantics fix as the batch trigger routes.

#4 account.tokens/route.tsx: defaultRoleId could land on a custom or
   plan-blocked role when userRoleId wasn't in the picker's assignable
   set. The action's submit-revalidation would then 400 until the user
   manually changed the dropdown. Clamp the default to roles the picker
   actually renders.

#5 settings.team/route.tsx: the role Select used defaultValue, so a
   failed set-role submit left the attempted role visible while the
   server kept the old one. Switch to a controlled value bound to
   currentRoleId.
devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread .github/workflows/e2e-webapp-auth-full.yml Fixed
Comment thread .github/workflows/e2e-webapp-auth-full.yml Fixed
Comment thread .github/workflows/e2e-webapp-auth-full.yml Fixed
Comment thread .github/workflows/e2e-webapp-auth-full.yml Fixed
matt-aitken added a commit that referenced this pull request May 6, 2026
Five real-bug fixes from CodeRabbit + Devin review of #3499:

#1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string
   was 'RBAC fallback not installed' but the OSS fallback actually
   returns 'RBAC plugin not installed'. The mismatch made every PAT
   creation with a roleId hit the compensating-delete branch on
   self-hosters with no plugin installed — including the dashboard
   PAT-creation flow. Aligns the constant with the canonical string.

#2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped
   the revoked-API-key grace window (RevokedApiKey table), so a
   freshly-rotated env API key would 401 immediately on the new auth
   path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey
   logic so the auth-cross-cutting e2e tests pass.

#3 api.v1.query.ts: multi-table queries built a plain RbacResource
   array, which checkAuth treats as 'any element passes'. A JWT
   scoped to one detected table could submit a query that also reads
   another table it isn't scoped to. Wrap with everyResource — same
   AND-semantics fix as the batch trigger routes.

#4 account.tokens/route.tsx: defaultRoleId could land on a custom or
   plan-blocked role when userRoleId wasn't in the picker's assignable
   set. The action's submit-revalidation would then 400 until the user
   manually changed the dropdown. Clamp the default to roles the picker
   actually renders.

#5 settings.team/route.tsx: the role Select used defaultValue, so a
   failed set-role submit left the attempted role visible while the
   server kept the old one. Switch to a controlled value bound to
   currentRoleId.
@matt-aitken matt-aitken marked this pull request as ready for review May 6, 2026 15:49
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 11 additional findings in Devin Review.

Open in Devin Review

matt-aitken added 14 commits May 6, 2026 20:53
…Client from public interface

- Replace buildBearerAbility/buildSessionAbility with authenticateBearer/authenticateSession
- Add RbacEnvironment, RbacUser, BearerAuthResult, SessionAuthResult types
- Remove PrismaClient from @trigger.dev/plugins interface (no Prisma crossing repo boundary)
- Remove @trigger.dev/database dependency and api-extractor from plugins package
- Switch plugins build to tsup --dts, delete api-extractor.json and tsconfig.dts.json
- OSS fallback imports PrismaClient from @trigger.dev/database directly
- OSS loader passes helpers-only to enterprise plugin, (prisma, helpers) to fallback
- Add rbac.server.ts singleton to webapp
- PoC: migrate admin.concurrency route to rbac.authenticateSession + canSuper()
Adds a `forceFallback` option to the RBAC loader so tests (and any other
consumer that sets RBAC_FORCE_FALLBACK=1) pin authentication to the OSS
fallback regardless of whether the enterprise plugin is installed.

- internal-packages/rbac: LazyController and RoleBaseAccess.create now accept
  RbacCreateOptions.forceFallback. When true, load() skips the dynamic import
  of @triggerdotdev/plugins/rbac and constructs RoleBaseAccessFallback
  directly.
- apps/webapp env: new RBAC_FORCE_FALLBACK BoolEnv, threaded into
  app/services/rbac.server.ts.
- testcontainers webapp: set RBAC_FORCE_FALLBACK=1 so e2e tests exercise the
  fallback deterministically.
- api-auth.e2e.test.ts: covers the fallback wiring end-to-end via the existing
  /admin/concurrency route, which already uses rbac.authenticateSession +
  ability.canSuper().
Close the coverage gap identified in the TRI-8716 audit before TRI-8719
swaps apiBuilder.server.ts to rbac.authenticateBearer. All new tests run
against the legacy authenticateApiRequestWithFailure /
authenticateApiRequestWithPersonalAccessToken path and must stay green
after the migration.

- Action requests (createActionApiRoute) via POST /api/v1/idempotencyKeys/:key/reset:
  * Valid private API key → passes auth (400 on zod body validation, not 401/403).
  * Missing Authorization → 401.
  * Invalid API key → 401.
- JWT on the same action route (allowJWT: true, superScopes write:runs, admin):
  * JWT with matching scope → passes auth.
  * JWT with read-only scope → 403.
- Personal access tokens (createLoaderPATApiRoute) via GET /api/v1/projects/:ref/runs:
  * Missing Authorization → 401.
  * API key (tr_dev_*) on PAT-only route → 401.
  * Wrong-prefix or malformed PAT → 401.
  * Well-formed but unknown PAT → 401.
  * Revoked PAT → 401.
  * Valid PAT on unknown project → 404 (auth passes).
- Edge case: valid API key whose project.deletedAt is set → 401.

Also fix the TRI-8715 redirect assertion: the webapp sends clients to
/login?redirectTo=... so compare by pathname rather than exact string.

New helper test/helpers/seedTestPAT.ts seeds a User + PersonalAccessToken
row using the same hashing/encryption scheme the webapp uses (shared
test ENCRYPTION_KEY), so the webapp subprocess can authenticate against
the seeded token.

otu and realtime.skipColumns propagation are deferred: they're only
observable via real trigger / realtime-stream flows, which need
workers/streams enabled and are out of scope for a coverage PR. The
migration preserves those fields via BearerAuthResult.jwt; dedicated
coverage can ride with TRI-8719 if needed.
Close the resource-scoped JWT coverage gap before TRI-8719 swaps
apiBuilder to rbac.authenticateBearer. Target:
POST /api/v1/waitpoints/tokens/:waitpointFriendlyId/complete — allowJWT,
resource: { waitpoints: params.waitpointFriendlyId }, superScopes:
[write:waitpoints, admin].

New helper test/helpers/seedTestWaitpoint.ts seeds a Waitpoint in
COMPLETED status so the handler short-circuits once auth passes, keeping
the 200 assertion independent of run-engine workers.

7 new tests exercise the legacy checkAuthorization scope algebra that
the migration must preserve:

- scope matches exact resource id → 200
- scope targets a different id of the same type → 403
- type-level scope (no id) grants all resources of that type → 200
- read-only scope on a write route → 403
- scope targets a different resource type → 403
- admin super-scope → 200 (legacy super-scope listing)
- unrelated type scope with no super-scope match → 403

Without these, the only JWT coverage was coarse type-level allow/deny
against routes whose resource callbacks returned () => 1 or () => ({}),
leaving resource-id matching entirely untested end-to-end.
Lock in the legacy checkAuthorization behaviours that TRI-8719 must
preserve once it swaps in rbac.authenticateBearer + ability.can.

Three tests in a new describe block 'JWT bearer auth — behaviours to
preserve through TRI-8719':

- Custom action route: type-level write:tasks JWT on
  POST /api/v1/tasks/:taskId/trigger (action: trigger) → auth passes
  today via exact superScope match. Must keep passing after TRI-8719
  via the ACTION_ALIASES map (trigger ← write).
- Multi-key resource: read:tags:<tag> JWT on /api/v1/runs/:runId/trace
  where the seeded run has that tag → auth passes today because
  legacy checks each resource key. Must keep passing after TRI-8719
  via ability.can's array-resource form.
- Multi-key resource: read:batch:<friendlyId> JWT on
  /api/v1/runs/:runId/trace where the seeded run is in that batch →
  same rationale as the tags case.

Dropped the planned empty-resource test: researching it surfaced that
legacy checkAuthorization denies empty-resource requests BEFORE the
super-scope check runs, so api.v1.batches.ts and idempotencyKeys reset
currently reject all JWTs despite allowJWT: true. TRI-8719's plan
(adding explicit { type: 'runs' }) is an intentional improvement, not
a preservation — documented in the TRI-8719 description comment.

New helper test/helpers/seedTestRun.ts seeds a minimal TaskRun (and,
optionally, an associated BatchTaskRun) that ApiRetrieveRunPresenter's
findRun can resolve for multi-key resource tests. The tests only
assert 'auth passes' (!== 401, !== 403) — the handler's downstream
behaviour (which may fail in a worker-less test env) isn't relevant
to the auth-layer contract.
Foundational changes before swapping apiBuilder to rbac.authenticateBearer.
No behaviour change yet — apiBuilder is still on the legacy path.

Array resources:
- @trigger.dev/plugins RbacAbility.can now accepts RbacResource | RbacResource[].
  Array form means 'grant access if any element passes', preserving the
  legacy checkAuthorization multi-key semantic once TRI-8719 completes.
- internal-packages/rbac ability.ts: permissive/super/deny pass through
  unchanged; buildJwtAbility iterates the array and short-circuits on
  first match.

Action alias wrapper (internal-packages/rbac/src/index.ts):
- ACTION_ALIASES map + withActionAliases function. Wraps an underlying
  RbacAbility so that can(action, resource) retries with alias actions
  when the direct check fails. Currently: trigger, batchTrigger, update
  are all satisfied by a scope whose action is write — matching legacy
  superScope behaviour for route.action values that don't align with
  scope prefixes.
- LazyController wraps the ability it gets from authenticateBearer /
  authenticateSession. authenticateAuthorize* stop delegating to the
  underlying's own Authorize methods (that would bypass the wrapper)
  and instead do the inline ability.can check against the wrapped
  ability.

The enterprise plugin (TRI-8720) does not need to know about aliases —
the wrapper applies uniformly regardless of which ability came back.

Tests:
- ability.test.ts: +4 tests for array resource form (31 total in file).
- loader.test.ts: +11 tests for withActionAliases (direct match, alias
  retry for trigger/batchTrigger/update, id-scoped retry, admin passes,
  array form retry, canSuper delegation).
- Unit suite: 31 tests, all passing.
- Webapp typecheck: clean.
…I-8719 Phase B)

Swap all three apiBuilder call sites (loader, action, multi-method) from
authenticateApiRequestWithFailure + checkAuthorization to a single RBAC
plugin bridge. 30 route files migrated in lockstep — drop the
authorization.superScopes option, convert resource callbacks to return
RbacResource or RbacResource[] in the new shape.

Infrastructure:

- apiBuilder: new authenticateRequestForApiBuilder helper funnels all
  three builders through rbac.authenticateBearer and follow-up
  findEnvironmentById to rebuild the legacy ApiAuthenticationResultSuccess
  shape handlers still expect (no handler-facing changes).
- @internal/rbac: re-export RbacAbility, RbacResource from
  @trigger.dev/plugins so webapp only depends on @trigger.dev/rbac.

Route-file changes (Risk mitigations from the ticket):

- Custom actions (trigger, batchTrigger, update) unchanged at the route
  level — the ACTION_ALIASES wrapper from Phase A handles them
  transparently.
- Multi-key runs routes (api.v3.runs.$runId, realtime.v1.runs.$runId,
  realtime.v1.streams.$runId.$streamId, api.v1.runs.$runId.events /
  .spans.$spanId / .trace, realtime.v1.streams.$runId.input.$streamId
  second block, plus the batch-array routes) now return
  RbacResource[] — any resource match grants access. Undefined batch
  ids are filtered out to avoid accidentally matching a type-level
  read:batch scope with no id.
- Empty-resource routes (api.v1.batches, api.v1.idempotencyKeys.$key.reset)
  now return { type: 'runs' } — JWTs with read:runs / write:runs start
  working where they were previously denied by the legacy
  empty-resource short-circuit. Intentional improvement, strict
  superset of today's accept set.
- Search-params routes (realtime.v1.runs, api.v1.runs) return an array
  with a collection-level { type: 'runs' } plus any filtered tag/task
  entries so JWTs that work today continue to work.

Verification:

- pnpm run typecheck --filter webapp: clean.
- pnpm run test --filter @internal/rbac: 31 unit tests pass (wrapper +
  array-resource semantics).
- E2E suite (test/api-auth.e2e.test.ts): all 31 tests pass on the new
  code path — the three pre-migration 'behaviours to preserve' tests
  (type-level write:tasks triggers a task, read:tags:<tag> reaches a
  run with that tag, read:batch:<id> reaches a run in that batch) are
  still green post-swap.

Packaging:

- .changeset/rbac-plugin-array-resources.md: minor bump for
  @trigger.dev/plugins (array-resource overload on RbacAbility.can).
- .server-changes/rbac-apibuilder-migration.md: webapp-only note.
Add a session-auth route builder analogous to apiBuilder.server.ts that
routes dashboard auth through rbac.authenticateSession and runs the
ability check (canSuper or can) before the handler runs. Routes that
only need a logged-in user (no authorisation) keep using requireUser /
requireUserId — the builder is opt-in for routes with explicit auth.

Builder shape:

  dashboardLoader({ authorization: { requireSuper: true } }, async ({ user, ability }) => ...)
  dashboardLoader({ authorization: { action, resource } }, ...)
  dashboardAction(...)

Auth failure throws a redirect Response so the success-path return type
stays narrow (useTypedLoaderData<typeof loader>() picks up the handler's
TypedJsonResponse). Optional context callback feeds organizationId /
projectId to authenticateSession when needed (enterprise-only — fallback
ignores context today).

Migrated 14 platform admin routes from
`requireUser` + `if (!user.admin)` to dashboardLoader / dashboardAction
with requireSuper: true:

  admin.tsx
  admin._index.tsx
  admin.concurrency.tsx
  admin.feature-flags.tsx
  admin.notifications.tsx
  admin.orgs.tsx
  admin.data-stores.tsx
  admin.back-office.tsx
  admin.back-office._index.tsx
  admin.back-office.orgs.$orgId.tsx
  admin.llm-models._index.tsx
  admin.llm-models.$modelId.tsx
  admin.llm-models.new.tsx
  admin.llm-models.missing._index.tsx
  admin.llm-models.missing.$model.tsx

Routes that have admin-only sub-features (e.g. show-extra-fields-if-admin
on otherwise public routes) stay on requireUser. Migration of those is a
separate concern — they don't gate access on admin, they just branch
display.

Behavioural change: action handlers that previously threw
`new Response('Unauthorized', { status: 403 })` on non-admins now redirect
to / along with the loader. Uniform behaviour, but XHR fetchers that
expected a 403 status would now follow the redirect instead. The admin
pages migrated here don't appear to have XHR fetchers that depend on the
403, but worth flagging.

Verification:
- pnpm run typecheck --filter webapp: clean.
- pnpm run test --filter @internal/rbac: 31 unit tests pass.
- E2E suite: all 31 tests pass — including the
  /admin/concurrency redirect test (now exercising the new builder).
matt-aitken and others added 27 commits May 6, 2026 20:53
The OSS no longer needs to know individual role names. systemRoles(orgId)
returns a plugin-owned, ordered SystemRole[] (id, name, description,
available) — the cloud plugin owns the canonical order, the descriptions,
and the per-org plan-tier 'available' flag. Hidden roles (Member in v1)
are filtered out entirely.

OSS callers iterate the array and use array index for the level ladder;
no role-name strings except for the legacy OrgMember.role enum mapping
shim, which is now isolated to one filter in member.server.ts.
…ker tightening

#1 Batch trigger AND semantics (security): `api.v[12].tasks.batch` now uses
`everyResource(...)` so a JWT scoped to taskA can no longer submit a batch
that also includes taskB / taskC. Added an `everyResource` helper to
`apiBuilder` (Symbol-marked wrapper that flips `ability.can` to `every`).
Multi-key OR semantics still apply for single-resource arrays (a run carries
multiple identifiers). Updated the e2e test to assert AND behaviour.

#3 Realtime stream resource (correctness): `findResource` for
`realtime.v1.streams.$runId.$streamId` now selects `taskIdentifier`,
`runTags`, and `realtimeStreamsVersion` — fields the auth resource
builder + handler read but findResource was returning undefined for.

#4 projectCreated optional chaining (crash bug): added the missing
`?.` between v3Subscription and plan so a missing subscription no longer
throws and aborts project creation.

#5 RBAC plugin loader logging: distinguish "plugin itself missing" from
"plugin found but a transitive dep failed to resolve" by inspecting the
ERR_MODULE_NOT_FOUND error message for the plugin's own module specifier.
The transitive-dep case now logs at error level (matches the comment's
stated behaviour). Removed the orphan log line that contradicted it.

#6 account.tokens picker source mismatch: the picker now sources roles
from the same plan-tier-filtered list (`systemRoles().filter(available)`)
as the default-role calculation. Added server-side roleId revalidation
in the create action so a hand-crafted POST can't bind a PAT to an
unavailable role.
Six per-feature RBAC changesets were accumulating across the branch.
The cumulative effect at release is purely additive — new methods on
the controller contract, new accepted shapes on existing methods —
so a single patch changeset captures it cleanly without the per-step
narrative bleeding into the release notes.
Mirror the changeset consolidation: collapse five per-feature RBAC
server-change files into one OSS-friendly entry covering the plugin
system + auth/authz consolidation work.
UserRole is the source of truth for cloud; OSS doesn't need the legacy
enum to carry the level. Drops the role-name lookup from member.server,
which was the last place OSS had to know about specific role names.
Previously the picker filtered to roles strictly below the inviter,
which meant Owners couldn't invite other Owners. Switch to at-or-below
so peer-level invites (Owner inviting Owner, Admin inviting Admin)
work — matches the existing user-facing copy and matches how most
team-management UIs behave.
Adds a batch variant of getUserRole that returns a Map<userId, Role | null>
in one round-trip. Used by TeamPresenter to drop the N+1 it had been
documenting as a future optimisation. Org-scoped only — project-scoped
reads still go through getUserRole (less common, not worth complicating
the API for).
Adds an optional `group` field to Permission so the plugin labels and
orders permission sections rather than the OSS hardcoding a name → group
map. Section order on the Roles page now follows the order permissions
appear in `allPermissions()` — first group seen, first rendered. Drops
the previous PERMISSION_GROUP_BY_NAME / GROUP_ORDER constants from the
webapp.
…mantics

The 6 session routes merged via PR #3417 were authored against the
pre-RBAC apiBuilder API: `authorization.resource` returned shapes like
`{ sessions: 'abc' }`, with a parallel `superScopes: [...]` whitelist
for broad-scope bypass. Post-TRI-8719, that shape doesn't typecheck and
`superScopes` is dead code.

Convert each resource callback to the canonical `{ type, id? }` shape.
For the two routes whose resource type is `tasks` but whose old
superScopes included `<action>:sessions` (list and create), use a
multi-key array `[{ type: 'tasks', id }, { type: 'sessions' }]` so a
JWT scoped `<action>:sessions` (no id) still passes — preserving the
exact allow-set the old superScopes mechanism granted. `*:all` and
`admin*` were already handled by the JWT ability's wildcard branches.

Drop the now-dead `superScopes` field from all 9 entries.

Adds e2e coverage in `auth-api.e2e.full.test.ts` (34 new tests, ~9
sub-describes) that locks in: per-task narrowing, `<action>:sessions`
type-only equivalence to the old superScope, `*:all` and `admin*`
bypass, wrong-action / wrong-id rejection. Plus a new
`seedTestApiSession` helper for inserting Session rows via Prisma —
distinct from the existing `seedTestSession` (cookie-session helper
for dashboard tests).
Adds RoleBaseAccessController.authenticatePat — PATs identify the user;
the effective ability is min(user's role in target org, max-role cap).
The user's actual membership is the floor (auto-narrows on demotion or
removal); the cap is set at PAT creation as a deliberate ceiling.

OSS-side:
- @trigger.dev/plugins gains the PatAuthResult type + authenticatePat
  on the controller interface.
- Fallback validates the PAT (prefix, hashed lookup, revoked check) and
  returns a permissive ability — preserves the pre-RBAC behaviour where
  PATs were pure user-identity tokens. Self-hosters see no change.
- LazyController delegates with the existing withActionAliases wrapper.

apiBuilder:
- createLoaderPATApiRoute accepts an optional context callback to
  derive { organizationId?, projectId? } and an optional authorization
  block. When either is declared, rbac.authenticatePat runs and the
  ability flows into the handler. Routes that don't opt in stay on the
  legacy permissive path.
- api.v1.projects.$projectRef.runs.ts opts in: context resolves
  projectRef -> organizationId, authorization is read on type runs.

UI:
- account.tokens picker reframed as 'Maximum role' with a hint
  explaining the cap-vs-floor model. Underlying TokenRole storage
  unchanged; semantic flip from 'bound role' to 'max role cap'.

The role chosen at PAT creation now actually constrains the token
(previously TokenRole was written but never read at request time).
The workflow pinned pnpm 10.23.0; root package.json declares 10.33.2.
pnpm/action-setup v4+ now rejects the conflict (root + workflow are
both 'specified versions' as far as the action is concerned), failing
the job before it can run any tests. Bump to v5.0.0 of the action and
match the 10.33.2 pin already used by unit-tests-webapp.yml and
release.yml.
The webapp unit-test config excluded *.e2e.test.ts (smoke matrix) but
not *.e2e.full.test.ts. The full auth suite needs a globalSetup that
spawns a webapp + Postgres container, which only the dedicated
vitest.e2e.full.config.ts provides. CI's unit-test shards were picking
up the e2e-full files via the include glob and failing immediately
with 'globalSetup didn't provide baseUrl/databaseUrl'.
Five real-bug fixes from CodeRabbit + Devin review of #3499:

#1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string
   was 'RBAC fallback not installed' but the OSS fallback actually
   returns 'RBAC plugin not installed'. The mismatch made every PAT
   creation with a roleId hit the compensating-delete branch on
   self-hosters with no plugin installed — including the dashboard
   PAT-creation flow. Aligns the constant with the canonical string.

#2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped
   the revoked-API-key grace window (RevokedApiKey table), so a
   freshly-rotated env API key would 401 immediately on the new auth
   path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey
   logic so the auth-cross-cutting e2e tests pass.

#3 api.v1.query.ts: multi-table queries built a plain RbacResource
   array, which checkAuth treats as 'any element passes'. A JWT
   scoped to one detected table could submit a query that also reads
   another table it isn't scoped to. Wrap with everyResource — same
   AND-semantics fix as the batch trigger routes.

#4 account.tokens/route.tsx: defaultRoleId could land on a custom or
   plan-blocked role when userRoleId wasn't in the picker's assignable
   set. The action's submit-revalidation would then 400 until the user
   manually changed the dropdown. Clamp the default to roles the picker
   actually renders.

#5 settings.team/route.tsx: the role Select used defaultValue, so a
   failed set-role submit left the attempted role visible while the
   server kept the old one. Switch to a controlled value bound to
   currentRoleId.
Locks in the everyResource(...) wrap added in 5547e66. Two new tests:
JWT scoped to one of multiple detected tables → 403; JWT scoped to all
detected tables → auth passes. Mirrors the every-task lock-in for the
batch trigger routes.
…-resource sites

Bare RbacResource[] in `authorization.resource` is now a type error.
Multi-resource auth must wrap with one of:

  - anyResource(...): succeed if any element passes (the existing default;
    used when one record carries multiple identifiers — runs by friendlyId
    / batch / tags / task — so a JWT scoped to any one grants access)
  - everyResource(...): succeed only if every element passes (existing
    helper; used by batch operations and the multi-table query route)

The OR-loophole class of bug CodeRabbit caught on api.v1.query — a JWT
scoped to one of N detected tables was authorized for the whole multi-
table query — was patchable per-route with everyResource. The Symbol
marker stayed invisible: future authors would still default to bare
arrays. Tightening AuthResource flushed out 11 routes that were silently
on the OR path; each is wrapped explicitly now.

Also inline the unrelated private `anyResource` helper in
internal-packages/rbac/src/ability.ts so the public name is unambiguous.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1 internal-packages/rbac/src/ability.ts (severity: 🔴 silent privilege
   escalation): buildJwtAbility was treating any scope starting with
   `admin:` as a universal wildcard. Pre-RBAC, the legacy
   checkAuthorization string-matched superScopes — `admin:sessions` only
   granted access to routes that explicitly listed it. After the JWT-
   ability split, the same scope was returning true for every action on
   every resource. Restrict the bypass to bare `admin` (no second
   segment); `admin:<type>` now flows through normal matching as
   action="admin" against resources of that type. Adds 2 regression
   tests in ability.test.ts.

#2 apps/webapp/app/services/routeBuilders/apiBuilder.server.ts (status
   discard): authenticateRequestForApiBuilder hardcoded `status: 401`
   even though BearerAuthResult.status is `401 | 403`. A plugin
   returning 403 (e.g. suspended account, IP block) would silently get
   downgraded to 401 — semantically wrong (401 = "who are you?", 403 =
   "you're not allowed") and confusing for client retry logic. Plumb
   result.status through.

#3 apps/webapp/app/services/routeBuilders/apiBuilder.server.ts
   (everyResource([]) vacuous truth): [].every() returns true, so
   everyResource([]) was passing auth for any token. Not exploitable
   today (Zod rejects empty bodies before auth), but the auth layer
   should never grant on empty input. Same defensive guard added to
   anyResource() for symmetry — only PERMISSIVE_ABILITY would have
   granted there, but the pattern shouldn't depend on the ability's
   choice.

#4 internal-packages/rbac/src/fallback.ts (PREVIEW env regression): the
   fallback's authenticateBearer looked up environments by apiKey only,
   skipping the branch-aware resolution that findEnvironmentByApiKey
   does for PREVIEW envs. Self-hosters using preview/branch envs would
   either fail or operate against the parent env. Mirror the legacy
   path: read x-trigger-branch, include matching child env, and pivot
   the resolved env to the child (apiKey/orgMember/organization/project
   inherited from parent). sanitizeBranchName inlined here because
   internal-packages can't import webapp code; comment notes the
   duplication.

All four flagged by Devin's PR review. Cloud plugin's buildJwtAbility
gets the same #1 fix in a sibling commit on this PR's cloud branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…core/v3/utils/gitBranch

Both helpers were originally in apps/webapp/app/v3/gitBranch.ts.
internal-packages/rbac needed sanitizeBranchName for the PREVIEW-env
branch resolution added in 8246234, and copy-pasting the function into
the fallback was a smell. Moves the canonical home into core (no
internal-package can import webapp code), and updates the four webapp
call sites + the rbac fallback to import from
@trigger.dev/core/v3/utils/gitBranch.

`sanitizeBranchName`'s input type is widened slightly to
`string | null | undefined` so callers passing `Headers.get(...)` (which
returns `string | null`) don't need a `?? undefined` workaround. The
existing webapp callers all pass `string | undefined`; the new union is
backwards-compatible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…minate redundant findEnvironmentById

Plugins now return the full env shape they fetched. The apiBuilder bridge passes it straight through to handlers — no follow-up findEnvironmentById round-trip.

@trigger.dev/core/v3/auth/environment (new) defines the slim AuthenticatedEnvironment structurally, independent of @trigger.dev/database. @trigger.dev/plugins re-exports it plus sanitizeBranchName so cloud's workspace-linked plugin imports from one surface.

Internal-packages/rbac/src/fallback.ts drops toRbacEnvironment stripping. apiBuilder.server.ts bridge drops findEnvironmentById. Model finders coerce to slim shape. Various downstream signature updates accept slim AE.

Net: single DB call per API request.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 new potential issues.

View 16 additional findings in Devin Review.

Open in Devin Review

paused: env.paused,
shortcode: env.shortcode,
maximumConcurrencyLimit: env.maximumConcurrencyLimit,
concurrencyLimitBurstFactor: env.concurrencyLimitBurstFactor,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 toAuthenticated() claims to coerce Prisma Decimal but passes it through unchanged

The comment on toAuthenticated() says it "converts concurrencyLimitBurstFactor from Prisma's Decimal to a plain number" but the implementation just passes the value through unchanged (concurrencyLimitBurstFactor: env.concurrencyLimitBurstFactor). The AuthenticatedEnvironment type now accepts number | DecimalLike, so it compiles, but at runtime the value is still a Prisma Decimal object — not a plain number. Any new downstream consumer that trusts the comment and performs arithmetic (e.g. env.concurrencyLimitBurstFactor * 2) will get NaN because Prisma's Decimal doesn't support JS arithmetic operators. The EnvironmentQueuePresenter was updated in this PR to handle both types (line 50-53), which proves the issue is real. The fix would be to add .toNumber() coercion here (or, if the pass-through is intentional, fix the misleading comment).

Suggested change
concurrencyLimitBurstFactor: env.concurrencyLimitBurstFactor,
concurrencyLimitBurstFactor: typeof env.concurrencyLimitBurstFactor === "number" ? env.concurrencyLimitBurstFactor : env.concurrencyLimitBurstFactor.toNumber(),
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +370 to +372
function toAuthenticatedEnvironment(env: RbacEnvironment): RbacEnvironment {
return env;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Fallback toAuthenticatedEnvironment is identity function — Prisma Decimal not coerced

The fallback's toAuthenticatedEnvironment in fallback.ts:370 is documented as coercing Prisma's Decimal to number but is just an identity pass-through. This is the same issue as in runtimeEnvironment.server.ts but on the RBAC fallback auth path. Every API request authenticated via the fallback (i.e. all OSS deployments) returns an AuthenticatedEnvironment with a Prisma Decimal for concurrencyLimitBurstFactor. Since the AuthenticatedEnvironment type now says number | DecimalLike, new code written to this contract might assume it's a number and fail with NaN on arithmetic.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +14 to +17
export const rbac = plugin.create(
prisma,
{ getSessionUserId },
{ forceFallback: env.RBAC_FORCE_FALLBACK }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 RBAC fallback reads from primary DB instead of read replica for every API auth call

The old findEnvironmentByApiKey (used by authenticateApiRequestWithFailure) used $replica for its Prisma reads, spreading auth-path load across replicas. The new RoleBaseAccessFallbackController.authenticateBearer at fallback.ts:115 uses this.prisma which is the primary write client (injected from rbac.server.ts:14 via import { prisma } from "~/db.server"). Since bearer auth runs on every API request, this shifts all auth read traffic from the replica to the primary. For high-traffic deployments this is a meaningful load increase on the primary database.

Relevant code in rbac.server.ts
import { prisma } from "~/db.server";
export const rbac = plugin.create(prisma, ...);

The prisma import is the primary client. The fallback should use $replica for reads.

Prompt for agents
The RBAC fallback controller receives a single PrismaClient and uses it for all reads (environment lookups, user lookups, PAT lookups). The old auth path used $replica (the read replica client) for these high-frequency reads. The fix should either:

1. Pass both prisma and $replica to the fallback (e.g. via a { primary, replica } object) and use replica for reads in authenticateBearer/authenticateSession/authenticatePat, OR
2. Import $replica in rbac.server.ts and pass it as the client (but this won't work for writes like setUserRole)

The cleanest approach is option 1: extend the fallback constructor to accept { primary: PrismaClient, replica: PrismaClient } and use replica in the read-only authenticate* methods. The RoleBaseAccessFallback class in internal-packages/rbac/src/fallback.ts and the rbac.server.ts initialization in apps/webapp/app/services/rbac.server.ts both need updating.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +115 to +118
let env = await this.prisma.runtimeEnvironment.findFirst({
where: { apiKey: rawToken },
include,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Fallback uses primary DB instead of $replica for auth queries

The RoleBaseAccessFallbackController in internal-packages/rbac/src/fallback.ts uses this.prisma (the primary Prisma client passed from apps/webapp/app/services/rbac.server.ts:14) for all authentication queries. The old code path (findEnvironmentByApiKey in apps/webapp/app/models/runtimeEnvironment.server.ts:28) used $replica. This means every API request's auth query now hits the primary database instead of the read replica. For high-traffic deployments this shifts significant read load from the replica to the primary. Not a correctness bug, but a performance regression worth noting.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

zizmor flagged on PR #3499:
- L72 actions/checkout@v4 unpinned → @de0fac2... v6.0.2 (matches the
  rest of the repo).
- L74 credential persistence → add `persist-credentials: false`. This
  job doesn't push, so leaving GITHUB_TOKEN in .git/config has no
  upside and a leak path through any subsequent step.
- L82 buildjet/setup-node@v4 unpinned → switched to actions/setup-node
  @48b55a01... v6.4.0 (already SHA-pinned and used by other workflows;
  this job runs on ubuntu-latest, no buildjet runner needed).
- L89 docker/login-action@v3 unpinned → @4907a6dd... v4.1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants