[DRAFT] [Do not merge] Experiment with sharded SQL E2E PR tests#5584
Draft
mikaelweave wants to merge 23 commits into
Draft
[DRAFT] [Do not merge] Experiment with sharded SQL E2E PR tests#5584mikaelweave wants to merge 23 commits into
mikaelweave wants to merge 23 commits into
Conversation
Split SQL E2E PR validation into opt-in category shards while keeping the shared template backward-compatible by default. Reindex and bulk update remain isolated jobs for the experiment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Route custom search parameter tests through an explicit SQL class filter so the shard is not empty, exclude those tests from the reindex shard, and serialize BulkDelete tests within an xUnit collection after the first experiment exposed concurrent 409 conflicts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add assembly-level xUnit collection behavior to match the E2E runner configuration intent and test whether BulkDelete conflicts are caused by in-job parallel execution under MTP. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move BulkDelete out of the parallel shard set and run it after the other SQL E2E shards plus Reindex and BulkUpdate to test whether its 409 conflicts are caused by concurrent async operation jobs in the same environment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add an opt-in SQL integration sharding experiment for PR SQL stages. The shards split SQL integration tests into General, SmartSearch, Search, Reindex, and SqlSpecific buckets while preserving default behavior for other callers. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Log SMART integration fixture setup, resource upsert, and search-service timings so the PR pipeline can identify whether the slow SmartSearch shard is dominated by setup or specific query patterns. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Seed SMART integration data once per version and data store so SmartOnFhir tests avoid rebuilding the expensive storage fixture for every test method. Use unique IDs for mutation tests while preserving update semantics, and dispose cached fixtures when the test process exits. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mark the 80K surrogate-id import stress test as ImportLongRunning and exclude that category from the default SQL E2E PR filter. CI explicitly passes its main category filter, so the long import coverage remains in CI while PR validation avoids the 20+ minute stress case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make the sharded SQL E2E Reindex job wait for the parallel E2E shards and BulkUpdate before it starts. BulkDelete remains terminal after Reindex, preserving its existing isolation while letting the next PR run test whether Reindex flakiness is caused by concurrent shard activity. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Add a RequiresIsolation test category and tag SQL E2E classes that exercise global async operations: BulkDelete, BulkUpdate, and Reindex. Standard sharded E2E jobs exclude RequiresIsolation while a single terminal RequiresIsolation lane runs after the parallel shards. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Only materialize the SmartSearch SQL integration shard for R4 and R4B so Stu3 and R5 do not run all-skipped SMART tests and fail with zero executed tests. The RequiresIsolation E2E lane remains dependent only on E2E shard jobs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split AAD cleanup/setup into a separate PR stage so application deployment no longer waits on AAD preparation. Test stages still depend on setupAad, and KeyVault NSP association waits for setupAad before using the test auth key vault. Also run the RequiresIsolation E2E lane in parallel for the next perf experiment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5584 +/- ##
==========================================
+ Coverage 77.28% 77.32% +0.03%
==========================================
Files 995 996 +1
Lines 36467 36539 +72
Branches 5520 5539 +19
==========================================
+ Hits 28184 28252 +68
+ Misses 6924 6908 -16
- Partials 1359 1379 +20 🚀 New features to boost your workflow:
|
Add integration-test xUnit runner settings that enable class-level collection parallelization, collapse PR SQL integration tests back to one job per version, and publish class-duration reports from SQL and Cosmos integration TRX output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Avoid publishing a second root runner config by writing the integration xUnit runner settings into extracted integration test binaries in the pipeline. Keep the class-duration reporting and single-job integration experiment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a pipeline-controlled E2E xUnit runner override and class-duration reporting for E2E jobs. PR SQL and Cosmos test stages opt into class-level test collection parallelization while CI retains the serial default. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
Author
|
/azp run |
|
Azure Pipelines failed to run 1 pipeline(s). |
Use a double-quoted artifact name so the replace expression's single-quoted arguments do not break YAML parsing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the RequiresIsolation experiment and return SQL E2E isolation to feature-category jobs: BulkUpdate, Reindex, and terminal BulkDelete. Re-enable PR SQL integration category sharding while keeping the E2E class-parallel experiment pipeline-controlled. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reintroduce RequiresIsolation for BulkDelete, BulkUpdate, and Reindex E2E tests. Regular SQL and Cosmos E2E lanes exclude those tests when class-level parallelism is enabled, and isolated lanes run after the broad E2E lanes to avoid async operation conflicts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Set succeededOrFailed conditions on SQL and Cosmos RequiresIsolation E2E jobs so they still run after dependent parallel E2E shards fail, giving complete isolation-lane signal in PR experiments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add optional Cosmos integration and E2E category sharding to match the SQL experiment. PR Cosmos stages now run integration and E2E category shards in parallel, keep SMART integration gated to R4/R4B, and run BulkDelete/Reindex in a RequiresIsolation E2E lane after the broad shards. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each SQL and Cosmos integration job (monolithic and sharded) previously rebuilt the integration test projects via a redundant 'Build Integration Test Projects' step, then ran 'dotnet test --no-build' against the source-workspace build - ignoring the already-published artifacts extracted into $(Agent.TempDirectory)/IntegrationTests (and the xunit.runner.json class-parallelism config written into them). Remove the per-job rebuild and instead run the extracted, pre-built test DLL directly (mirroring e2e-tests.yml), passing the coverage and TRX arguments. This eliminates duplicate compilation across parallel shards and makes the class-parallelism config actually apply. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The CosmosSpecific shard filtered only on FullyQualifiedName~CosmosDb. That substring matches both Cosmos-only test classes and the "(CosmosDb)" data-store suffix that the custom xUnit framework appends to every DataStore.All test. As a result, Reindex (IndexAndReindex), Smart (SmartOnFhir) and Search/SearchParameterStatus (Search) Cosmos variants ran in BOTH their dedicated shard and CosmosSpecific - executing twice and inflating CosmosSpecific. Add the same category exclusions the SQL SqlSpecific shard already uses so each categorized test runs only in its dedicated shard. CosmosSpecific remains the catch-all for the remaining Cosmos variants (Export, DataSourceValidation, uncategorized persistence tests, Cosmos-only classes). The Search and Reindex shards are never gated, so those categories stay covered for every version; SmartOnFhir has no integration tests for Stu3/R5 (only the R4/R4B-gated SmartSearch shard), so no tests are dropped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Security analysis only consumes the deploy and nuget artifacts produced by BuildArtifacts and runs its own RoslynAnalyzers build; it never uses unit-test output. Removing the BuildUnitTests dependency lets the stage start as soon as BuildArtifacts completes (about 14 minutes earlier in build 49614) and prevents it from becoming the pipeline tail as the test stages get faster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
runShardedE2eJobsandrunShardedIntegrationTestsmodes to the SQL test job template.RequiresIsolationlane for async/global-state-sensitive tests.[SmartSearchTiming]diagnostics to the SMART integration tests to capture setup, upsert, and per-search timings.ImportLongRunningso PR SQL E2E skips it while CI SQL E2E still runs it via explicit CI filters.Experiment notes
This PR measures whether PR SQL validation wall-clock time improves when broad SQL test buckets are split by category/class filters, while keeping sensitive async-operation tests isolated rather than increasing in-process test parallelization.
E2E iteration findings
Search+IndexAndReindex, notCustomSearch; later iterations route those tests with an explicit SQL class filter and excludes them from Reindex.409 Conflictresponses.409 Conflictfailures for Stu3 and R4. R5 BulkDelete was blocked by an unrelated-looking R5 General401 Unauthorizedfailure.GivenIncrementalLoad_80KSurrogateIds_BadRequestIsReturnedactive for roughly 23m, so that test is now taggedImportLongRunningand excluded from the default PR SQL E2E filter.Unable to sync search parameter cache, and cleanup logged a 409 conflict deleting a SearchParameter while a reindex job was ongoing.RequiresIsolation E2E experiment
Categories.RequiresIsolationas an orthogonal scheduling trait for SQL E2E tests that mutate shared async/global state.BulkDeleteTests,BulkUpdateTests, andReindexTestswithRequiresIsolation.Category=RequiresIsolationthroughmainCategoryFilter.sqlE2eTests_RequiresIsolationlane after all parallel E2E shards.BulkUpdateTests=33,BulkDeleteTests=16,ReindexTests=22); all standard shards reportisolationLeakCount=0.SQL integration iteration
4 + 63 + 55 + 41 + 197 = 360SQL-discoverable integration tests per version.SMART timing diagnostics and fixture reuse
InitializeAsyncran for every SMART test and repeatedly recreated the SQL fixture plus reseeded all SMART bundles/resources.Validation
git diff --checkdotnet build src\Microsoft.Health.Fhir.Tests.Common\Microsoft.Health.Fhir.Tests.Common.csproj -f net9.0 --configuration Release --no-restoredotnet build test\Microsoft.Health.Fhir.R4.Tests.E2E\Microsoft.Health.Fhir.R4.Tests.E2E.csproj -f net9.0 --configuration Release --no-restore /p:NETCoreSdkVersion=9.0.313dotnet build test\Microsoft.Health.Fhir.R4.Tests.Integration\Microsoft.Health.Fhir.R4.Tests.Integration.csproj -f net9.0dotnet build test\Microsoft.Health.Fhir.Stu3.Tests.Integration\Microsoft.Health.Fhir.Stu3.Tests.Integration.csproj -f net9.0dotnet build test\Microsoft.Health.Fhir.R5.Tests.Integration\Microsoft.Health.Fhir.R5.Tests.Integration.csproj -f net9.0dotnet <integration dll> --list-tests --filter ...for all SQL integration shard filters across Stu3/R4/R5.dotnet <R4 integration dll> --list-tests --filter "FullyQualifiedName!~CosmosDb&Category=SmartOnFhir"found 63 SQL SmartSearch tests.dotnet test test\Microsoft.Health.Fhir.R4.Tests.Integration\Microsoft.Health.Fhir.R4.Tests.Integration.csproj --configuration Release -f net9.0 -- --filter "FullyQualifiedName!~CosmosDb&Category=SmartOnFhir" --retry-failed-tests 3 --coverage --coverage-output-format cobertura --coverage-settings "<repo>\CodeCoverage.Mtp.settings.xml" --report-trx --output Detailedpassed 63/63 in 60.1s test duration, 83.5s including build.FullyQualifiedName~SqlServer&Category=Import&Category!=ImportLongRunningdoes not listGivenIncrementalLoad_80KSurrogateIds_BadRequestIsReturned;FullyQualifiedName~SqlServer&Category=Import&Category=ImportLongRunningselects it.