Skip to content

fix: correct AB test execution role IAM policy and promote stability#1126

Merged
jariy17 merged 1 commit intopreviewfrom
fix/ab-test-role-evaluator-permission-preview
May 6, 2026
Merged

fix: correct AB test execution role IAM policy and promote stability#1126
jariy17 merged 1 commit intopreviewfrom
fix/ab-test-role-evaluator-permission-preview

Conversation

@jariy17
Copy link
Copy Markdown
Contributor

@jariy17 jariy17 commented May 5, 2026

Cherry-pick of #1120 onto preview.

Summary

  • IAM policy — Aligns the auto-created AB test execution role with the official docs, fixing a 400 "Access denied when validating evaluator" error on deploy
  • Promote stabilitypromote ab-test waits for executionStatus === 'RUNNING' before stopping, avoiding 409 errors when promote runs immediately after resume
  • DescribeLogGroups — Split into its own statement with Resource: "*" (AWS requirement)
  • E2E coverage — Added AB test reaches RUNNING status after deploy to ab-test-target-based.test.ts
  • Status test — Fixed incorrect http-gateway resourceType assertion
  • Local e2e script — Added scripts/run-e2e-local.sh

See #1120 for full details.

…1120)

* fix: add GetEvaluator permission to AB test execution role

The AB test API validates evaluator ARNs server-side using the execution
role. The auto-created role policy was missing bedrock-agentcore:GetEvaluator,
causing a 400 "Access denied when validating evaluator" error on every deploy
that included a target-based AB test with a custom evaluator.

* fix: correct status assertion in target-based AB test e2e

HTTP gateways are not surfaced as top-level resources in agentcore status —
they are only used internally to build AB test invocation URLs. The test was
asserting on resourceType 'http-gateway' which never appears; fix it to
assert on the 'ab-test' resource instead.

* fix: add invocationUrl to status resource type cast in e2e test

The resource cast was missing invocationUrl, causing a TypeScript error
when asserting the AB test's gateway invocation URL was present.

* fix: improve promote error message and add local e2e run script

- Include stdout in promote failure message so the JSON error is visible
- Add scripts/run-e2e-local.sh to replicate the GitHub Actions e2e workflow locally

* fix: retry promote stop on 409 UPDATING status

When promote is called immediately after resume, the AB test may still be
in UPDATING state and reject the STOPPED transition with a 409. Retry up
to 6 times with a 10s delay to wait for the transition to complete.

* fix: wait for AB test to leave UPDATING state before promoting

Poll getABTest until executionStatus is no longer UPDATING before
attempting to stop. The service rejects updates with 409 while a
state transition is in progress (e.g. after resume).

* fix: wait for RUNNING before stopping AB test in promote

Poll getABTest until executionStatus === 'RUNNING' before issuing the
STOPPED transition. The service 409s if the test is still UPDATING
(e.g. transitioning from a prior resume).

* fix: resolve evaluator ARNs transitively from AB test online eval configs

Previously GetEvaluator was scoped to all project evaluators. Now it's
scoped to only the evaluators referenced by the specific online eval
configs this AB test uses, handling all three cases:
- Custom evaluators: looked up from deployedResources.evaluators
- Builtin.* evaluators: ARN constructed from the builtin ID
- External ARN references: used as-is

* fix: align AB test execution role policy with official docs

Replace the hand-rolled per-resource policy with the canonical policy
from https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/ab-testing-prereqs.html:

- Trust policy: add SourceAccount + SourceArn conditions
- Permissions: single AgentCoreResources statement scoped to
  arn:aws:bedrock-agentcore:*:${accountId}:* with ResourceAccount condition
- Add missing actions: GetGatewayTarget, ListGatewayTargets,
  ListConfigurationBundleVersions
- CloudWatch logs scoped to account via PrincipalAccount pattern
- Remove per-evaluator/per-resource ARN tracking (no longer needed)

* test: update IAM role unit tests to match new policy structure

* fix: throw error if AB test never reaches RUNNING before promote

* test: add unit tests for promote polling logic and extract to promote-utils

Extract waitForRunningThenStop into promote-utils.ts so it can be unit
tested without pulling in React/ink. Add 4 tests covering: immediate
RUNNING, polling until RUNNING, timeout throws, and error message content.

* fix: split DescribeLogGroups into wildcard resource statement

* test: verify AB test reaches RUNNING status after deploy in both e2e suites

* fix: remove console.log statements that leaked ARNs and fix bundle ARN/version format in config-bundle e2e test

* test: remove second deploy and RUNNING check from config-bundle e2e (follow-up with real bundles)

---------

Co-authored-by: Local E2E <ci@local>
@jariy17 jariy17 requested a review from a team May 5, 2026 23:04
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label May 5, 2026
@agentcore-cli-automation
Copy link
Copy Markdown

Reviewed the cherry-pick. All the serious issues I'd raise have already been flagged in the existing discussion:

  • The Builtin/external evaluator ARN concern was raised on fix: correct AB test execution role IAM policy and promote stability #1120 and addressed by moving to the docs-aligned policy template.
  • The gateway scope loosening was raised and justified by the author against the AWS docs.
  • The TUI promote race condition (same updateABTest({ executionStatus: 'STOPPED' }) call at ABTestDetailScreen.tsx:341 without a waitForRunningThenStop) was already raised by @notgitika on this PR and is still awaiting a response/fix.

The cherry-pick onto preview itself looks clean — the diff matches what landed on main via #1120, and the new waitForRunningThenStop utility + tests are correct. LGTM aside from the already-flagged TUI question.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label May 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 43.6% 9964 / 22852
🔵 Statements 42.89% 10589 / 24686
🔵 Functions 40.63% 1684 / 4144
🔵 Branches 40.28% 6430 / 15962
Generated in workflow #2463 for commit f8e8cdf by the Vitest Coverage Report Action

Copy link
Copy Markdown
Contributor

@notgitika notgitika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jariy17 jariy17 merged commit 62419a6 into preview May 6, 2026
17 checks passed
@jariy17 jariy17 deleted the fix/ab-test-role-evaluator-permission-preview branch May 6, 2026 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants