Skip to content

[DRAFT] [Do not merge] Experiment with sharded SQL E2E PR tests#5584

Draft
mikaelweave wants to merge 23 commits into
mainfrom
mikaelweave/e2e-parallelization
Draft

[DRAFT] [Do not merge] Experiment with sharded SQL E2E PR tests#5584
mikaelweave wants to merge 23 commits into
mainfrom
mikaelweave/e2e-parallelization

Conversation

@mikaelweave
Copy link
Copy Markdown
Contributor

@mikaelweave mikaelweave commented May 25, 2026

Summary

  • Adds opt-in runShardedE2eJobs and runShardedIntegrationTests modes to the SQL test job template.
  • Enables sharded SQL E2E and SQL integration PR validation for Stu3, R4, and R5 SQL stages.
  • Splits SQL E2E into normal parallel shards plus a terminal RequiresIsolation lane for async/global-state-sensitive tests.
  • Splits SQL integration tests into General, SmartSearch, Search, Reindex, and SqlSpecific shards to measure whether the slow R4 SQL integration job can be reduced.
  • Adds temporary [SmartSearchTiming] diagnostics to the SMART integration tests to capture setup, upsert, and per-search timings.
  • Reuses SMART integration fixture/data setup once per FHIR version + data store so SmartOnFhir tests avoid reseeding the full dataset for every test method.
  • Marks the 80K surrogate-id import stress test as ImportLongRunning so PR SQL E2E skips it while CI SQL E2E still runs it via explicit CI filters.

Experiment notes

This PR measures whether PR SQL validation wall-clock time improves when broad SQL test buckets are split by category/class filters, while keeping sensitive async-operation tests isolated rather than increasing in-process test parallelization.

E2E iteration findings

  • Iteration 1 confirmed Azure Pipelines expanded the expected E2E shard jobs for all SQL versions.
  • CustomSearch initially selected zero tests because custom search parameter tests are compiled as Search + IndexAndReindex, not CustomSearch; later iterations route those tests with an explicit SQL class filter and excludes them from Reindex.
  • BulkDelete initially failed with concurrent 409 Conflict responses.
  • Assembly-level E2E serialization alone did not remove BulkDelete conflicts.
  • Moving BulkDelete out of the parallel shard set and running it after all other SQL E2E shards plus Reindex and BulkUpdate eliminated observed BulkDelete 409 Conflict failures for Stu3 and R4. R5 BulkDelete was blocked by an unrelated-looking R5 General 401 Unauthorized failure.
  • Build 49548 showed R4 SQL Import taking 27m 38s of test time. The runner progress log showed GivenIncrementalLoad_80KSurrogateIds_BadRequestIsReturned active for roughly 23m, so that test is now tagged ImportLongRunning and excluded from the default PR SQL E2E filter.
  • Build 49548 showed R4 SQL Reindex taking 20m 47s of test-task time because the first attempt failed 4 tests, then retry passed. Failures were search-parameter/reindex global-state sensitive: three jobs failed with Unable to sync search parameter cache, and cleanup logged a 409 conflict deleting a SearchParameter while a reindex job was ongoing.
  • Build 49554/49562 showed Reindex passing cleanly after being moved later, which supports isolating async/global-state-sensitive E2E tests.

RequiresIsolation E2E experiment

  • Added Categories.RequiresIsolation as an orthogonal scheduling trait for SQL E2E tests that mutate shared async/global state.
  • Tagged SQL E2E BulkDeleteTests, BulkUpdateTests, and ReindexTests with RequiresIsolation.
  • Standard sharded SQL E2E jobs now exclude Category=RequiresIsolation through mainCategoryFilter.
  • The PR pipeline now runs one terminal sqlE2eTests_RequiresIsolation lane after all parallel E2E shards.
  • Local R4 E2E discovery: terminal isolation filter selects 71 SQL tests (BulkUpdateTests=33, BulkDeleteTests=16, ReindexTests=22); all standard shards report isolationLeakCount=0.

SQL integration iteration

  • Build 49526 showed R4 SQL integration taking ~42m total, with the test step taking ~35m including retry. R4 Cosmos integration took ~30m total.
  • The long-pole evidence points to test execution, not report publishing; report/coverage aggregation was seconds.
  • SmartSearch and SQL-specific tests are likely major contributors, with SmartSearch visible near the 20-25m mark in the R4 SQL log.
  • SQL integration is split into five shards: General, SmartSearch, Search, Reindex, and SqlSpecific.
  • Local discovery validated each shard for Stu3/R4/R5: 4 + 63 + 55 + 41 + 197 = 360 SQL-discoverable integration tests per version.

SMART timing diagnostics and fixture reuse

  • Initial local R4 SQL SmartSearch replay showed InitializeAsync ran for every SMART test and repeatedly recreated the SQL fixture plus reseeded all SMART bundles/resources.
  • SMART setup now uses a shared static context keyed by FHIR version + data store. The custom xUnit argument framework creates synthetic fixture instances, so the cached context is disposed at process exit rather than per class fixture instance.
  • Mutation tests now use unique per-test resource IDs. Update tests first create the unique resource, then update that same ID, so they still exercise update behavior without mutating seeded shared resources.
  • Local R4 SQL SmartSearch with pipeline-style coverage dropped from about 180s before fixture reuse to about 60-67s after fixture reuse.

Validation

  • git diff --check
  • dotnet build src\Microsoft.Health.Fhir.Tests.Common\Microsoft.Health.Fhir.Tests.Common.csproj -f net9.0 --configuration Release --no-restore
  • dotnet build test\Microsoft.Health.Fhir.R4.Tests.E2E\Microsoft.Health.Fhir.R4.Tests.E2E.csproj -f net9.0 --configuration Release --no-restore /p:NETCoreSdkVersion=9.0.313
  • dotnet build test\Microsoft.Health.Fhir.R4.Tests.Integration\Microsoft.Health.Fhir.R4.Tests.Integration.csproj -f net9.0
  • dotnet build test\Microsoft.Health.Fhir.Stu3.Tests.Integration\Microsoft.Health.Fhir.Stu3.Tests.Integration.csproj -f net9.0
  • dotnet build test\Microsoft.Health.Fhir.R5.Tests.Integration\Microsoft.Health.Fhir.R5.Tests.Integration.csproj -f net9.0
  • dotnet <integration dll> --list-tests --filter ... for all SQL integration shard filters across Stu3/R4/R5.
  • dotnet <R4 integration dll> --list-tests --filter "FullyQualifiedName!~CosmosDb&Category=SmartOnFhir" found 63 SQL SmartSearch tests.
  • dotnet test test\Microsoft.Health.Fhir.R4.Tests.Integration\Microsoft.Health.Fhir.R4.Tests.Integration.csproj --configuration Release -f net9.0 -- --filter "FullyQualifiedName!~CosmosDb&Category=SmartOnFhir" --retry-failed-tests 3 --coverage --coverage-output-format cobertura --coverage-settings "<repo>\CodeCoverage.Mtp.settings.xml" --report-trx --output Detailed passed 63/63 in 60.1s test duration, 83.5s including build.
  • R4 E2E discovery: FullyQualifiedName~SqlServer&Category=Import&Category!=ImportLongRunning does not list GivenIncrementalLoad_80KSurrogateIds_BadRequestIsReturned; FullyQualifiedName~SqlServer&Category=Import&Category=ImportLongRunning selects it.

mikaelweave and others added 9 commits May 25, 2026 06:52
Split SQL E2E PR validation into opt-in category shards while keeping the shared template backward-compatible by default. Reindex and bulk update remain isolated jobs for the experiment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Route custom search parameter tests through an explicit SQL class filter so the shard is not empty, exclude those tests from the reindex shard, and serialize BulkDelete tests within an xUnit collection after the first experiment exposed concurrent 409 conflicts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add assembly-level xUnit collection behavior to match the E2E runner configuration intent and test whether BulkDelete conflicts are caused by in-job parallel execution under MTP.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move BulkDelete out of the parallel shard set and run it after the other SQL E2E shards plus Reindex and BulkUpdate to test whether its 409 conflicts are caused by concurrent async operation jobs in the same environment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an opt-in SQL integration sharding experiment for PR SQL stages. The shards split SQL integration tests into General, SmartSearch, Search, Reindex, and SqlSpecific buckets while preserving default behavior for other callers.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Log SMART integration fixture setup, resource upsert, and search-service timings so the PR pipeline can identify whether the slow SmartSearch shard is dominated by setup or specific query patterns.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Seed SMART integration data once per version and data store so SmartOnFhir tests avoid rebuilding the expensive storage fixture for every test method. Use unique IDs for mutation tests while preserving update semantics, and dispose cached fixtures when the test process exits.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mark the 80K surrogate-id import stress test as ImportLongRunning and exclude that category from the default SQL E2E PR filter. CI explicitly passes its main category filter, so the long import coverage remains in CI while PR validation avoids the 20+ minute stress case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the sharded SQL E2E Reindex job wait for the parallel E2E shards and BulkUpdate before it starts. BulkDelete remains terminal after Reindex, preserving its existing isolation while letting the next PR run test whether Reindex flakiness is caused by concurrent shard activity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mikaelweave
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

mikaelweave and others added 3 commits May 27, 2026 17:25
Add a RequiresIsolation test category and tag SQL E2E classes that exercise global async operations: BulkDelete, BulkUpdate, and Reindex. Standard sharded E2E jobs exclude RequiresIsolation while a single terminal RequiresIsolation lane runs after the parallel shards.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Only materialize the SmartSearch SQL integration shard for R4 and R4B so Stu3 and R5 do not run all-skipped SMART tests and fail with zero executed tests. The RequiresIsolation E2E lane remains dependent only on E2E shard jobs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split AAD cleanup/setup into a separate PR stage so application deployment no longer waits on AAD preparation. Test stages still depend on setupAad, and KeyVault NSP association waits for setupAad before using the test auth key vault. Also run the RequiresIsolation E2E lane in parallel for the next perf experiment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.32%. Comparing base (fdf6900) to head (cf00ab0).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5584      +/-   ##
==========================================
+ Coverage   77.28%   77.32%   +0.03%     
==========================================
  Files         995      996       +1     
  Lines       36467    36539      +72     
  Branches     5520     5539      +19     
==========================================
+ Hits        28184    28252      +68     
+ Misses       6924     6908      -16     
- Partials     1359     1379      +20     

see 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mikaelweave and others added 3 commits May 27, 2026 20:58
Add integration-test xUnit runner settings that enable class-level collection parallelization, collapse PR SQL integration tests back to one job per version, and publish class-duration reports from SQL and Cosmos integration TRX output.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Avoid publishing a second root runner config by writing the integration xUnit runner settings into extracted integration test binaries in the pipeline. Keep the class-duration reporting and single-job integration experiment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a pipeline-controlled E2E xUnit runner override and class-duration reporting for E2E jobs. PR SQL and Cosmos test stages opt into class-level test collection parallelization while CI retains the serial default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mikaelweave
Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines failed to run 1 pipeline(s).

mikaelweave and others added 2 commits May 28, 2026 06:34
Use a double-quoted artifact name so the replace expression's single-quoted arguments do not break YAML parsing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the RequiresIsolation experiment and return SQL E2E isolation to feature-category jobs: BulkUpdate, Reindex, and terminal BulkDelete. Re-enable PR SQL integration category sharding while keeping the E2E class-parallel experiment pipeline-controlled.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mikaelweave mikaelweave changed the title Experiment with sharded SQL E2E PR tests [DRAFT] [Do not merge] Experiment with sharded SQL E2E PR tests May 28, 2026
mikaelweave and others added 6 commits May 28, 2026 10:25
Reintroduce RequiresIsolation for BulkDelete, BulkUpdate, and Reindex E2E tests. Regular SQL and Cosmos E2E lanes exclude those tests when class-level parallelism is enabled, and isolated lanes run after the broad E2E lanes to avoid async operation conflicts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Set succeededOrFailed conditions on SQL and Cosmos RequiresIsolation E2E jobs so they still run after dependent parallel E2E shards fail, giving complete isolation-lane signal in PR experiments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add optional Cosmos integration and E2E category sharding to match the SQL experiment. PR Cosmos stages now run integration and E2E category shards in parallel, keep SMART integration gated to R4/R4B, and run BulkDelete/Reindex in a RequiresIsolation E2E lane after the broad shards.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each SQL and Cosmos integration job (monolithic and sharded) previously
rebuilt the integration test projects via a redundant 'Build Integration
Test Projects' step, then ran 'dotnet test --no-build' against the
source-workspace build - ignoring the already-published artifacts
extracted into $(Agent.TempDirectory)/IntegrationTests (and the
xunit.runner.json class-parallelism config written into them).

Remove the per-job rebuild and instead run the extracted, pre-built test
DLL directly (mirroring e2e-tests.yml), passing the coverage and TRX
arguments. This eliminates duplicate compilation across parallel shards
and makes the class-parallelism config actually apply.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The CosmosSpecific shard filtered only on FullyQualifiedName~CosmosDb.
That substring matches both Cosmos-only test classes and the "(CosmosDb)"
data-store suffix that the custom xUnit framework appends to every
DataStore.All test. As a result, Reindex (IndexAndReindex), Smart
(SmartOnFhir) and Search/SearchParameterStatus (Search) Cosmos variants
ran in BOTH their dedicated shard and CosmosSpecific - executing twice
and inflating CosmosSpecific.

Add the same category exclusions the SQL SqlSpecific shard already uses
so each categorized test runs only in its dedicated shard. CosmosSpecific
remains the catch-all for the remaining Cosmos variants (Export,
DataSourceValidation, uncategorized persistence tests, Cosmos-only
classes).

The Search and Reindex shards are never gated, so those categories stay
covered for every version; SmartOnFhir has no integration tests for
Stu3/R5 (only the R4/R4B-gated SmartSearch shard), so no tests are
dropped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Security analysis only consumes the deploy and nuget artifacts produced by
BuildArtifacts and runs its own RoslynAnalyzers build; it never uses unit-test
output. Removing the BuildUnitTests dependency lets the stage start as soon as
BuildArtifacts completes (about 14 minutes earlier in build 49614) and prevents
it from becoming the pipeline tail as the test stages get faster.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants