Skip to content

acceptance: shard acceptance tests across 4 parallel CI jobs#5435

Draft
denik wants to merge 8 commits into
mainfrom
denik/test-sharding
Draft

acceptance: shard acceptance tests across 4 parallel CI jobs#5435
denik wants to merge 8 commits into
mainfrom
denik/test-sharding

Conversation

@denik
Copy link
Copy Markdown
Contributor

@denik denik commented Jun 4, 2026

Summary

  • Adds SHARD_INDEX / SHARD_TOTAL env-var support to getTests() in acceptance/acceptance_test.go: each CI job runs only its slice of the 854 acceptance tests (~214 each). The list is already sorted alphabetically, so the modulo split is deterministic and stable across runs.
  • Adds shard_index: [0, 1, 2, 3] to the CI matrix in push.yml, passing SHARD_INDEX and SHARD_TOTAL: 4 to the test step.
  • Job count: 6 → 24 normally; 2 → 8 in the merge queue (linux-only). The test-result aggregator and testmask gating require no changes — GitHub Actions waits for all matrix combinations of a job automatically.

Local runs are unaffected: without SHARD_TOTAL set (or with SHARD_TOTAL=1), getTests() returns the full list as before.

Test plan

  • Verify CI jobs are named correctly (shard 0 .. shard 3) and each runs ~214 tests
  • Verify test-result waits for all 24 (or 8 in merge queue) jobs before passing

This pull request was AI-assisted by Isaac.

Adds SHARD_INDEX / SHARD_TOTAL env-var support to getTests() so each CI
job runs only its share of the 854 acceptance tests (~214 each). The
sorted test list ensures the split is deterministic and stable.

The CI matrix gains a shard_index dimension [0,1,2,3], turning 6 test
jobs into 24 (and 2 → 8 in the merge queue). The test-result aggregator
and testmask gating are unaffected — GitHub Actions waits for all matrix
combinations automatically.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:43 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:43 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Waiting for approval

Based on git history, these people are best suited to review:

  • @pietern -- recent work in .github/workflows/, acceptance/

Eligible reviewers: @andrewnester, @anton-107, @renaudhartert-db, @shreyas-goenka, @simonfaltum

Suggestions based on git history. See OWNERS for ownership rules.

@denik denik marked this pull request as draft June 4, 2026 09:45
The sharded acceptance matrix re-ran the full unit suite on all 24 jobs
(4 shards x 3 OS x 2 deployment), and all of them tried to save the Go
cache under the same key on main — only the first writer wins, so the
rest wasted time.

Split into:
- test-unit: one job per OS (no deployment/shard dimension), runs
  `task test-unit`, and is the sole writer of the shared "test" cache.
- test (acc): runs `task test-acc` only, restores the "test" cache
  (save-cache=false so the many shard/deployment instances don't
  collide on the key).

setup-build-environment gains a save-cache input (default true, so the
test-exp-* / test-pipelines jobs with unique keys keep saving as before)
that gates the on-main cache save.

test-result now also waits on test-unit.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:55 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:55 — with GitHub Actions Inactive
TestInprocessMode calls testAccept with a specific singleTest
("selftest/basic"). The shard filter lived in getTests(), so it ran
before singleTest selection and could strip the requested test out of
the shard, failing with "did not match any tests" on every shard that
didn't own selftest/basic.

Move the shard filter into a shardTests helper applied in testAccept
only when singleTest == "", leaving named-test selection unsharded.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 4, 2026 10:10 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 10:10 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Jun 4, 2026

Commit: e6a073e

Run: 26952689987

@denik denik temporarily deployed to test-trigger-is June 4, 2026 11:08 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 11:08 — with GitHub Actions Inactive
A static cross-product matrix forces one shard count for every
(os, engine). Windows is the long pole (TASK_CONCURRENCY=1 serializes
within a job) while the direct engine is fast, so a uniform count
over- or under-shards most combinations.

Generate the acceptance shard matrix in testmask as an explicit
include-list and consume it via fromJSON. Shard counts:

  windows/terraform: 8   windows/direct: 8
  linux/terraform:   4   linux/direct:   2
  macos/terraform:   4   macos/direct:   2

merge_group still runs Linux only (6 jobs). PR/push runs 28 acc jobs.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 4, 2026 12:48 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 12:48 — with GitHub Actions Inactive
The gotestsum JSON upload is debug-only timing telemetry, yet a transient
GitHub artifact-service error during finalization failed an otherwise-
passing windows test-acc shard. Mark both upload steps continue-on-error
so infra hiccups on a debug artifact never block the merge.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 5, 2026 14:57 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 14:57 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:27 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:27 — with GitHub Actions Inactive
@denik denik deployed to test-trigger-is June 5, 2026 15:56 — with GitHub Actions Active
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:56 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants