Increase timeout and add debug logging in lifecycle E2E test#4331
Merged
Increase timeout and add debug logging in lifecycle E2E test#4331
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4331 +/- ##
==========================================
- Coverage 69.08% 68.91% -0.17%
==========================================
Files 478 478
Lines 48432 48484 +52
==========================================
- Hits 33457 33413 -44
- Misses 12314 12339 +25
- Partials 2661 2732 +71 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The workload lifecycle E2E test ("create-list-delete") was flaking in CI
because the 60s timeout was insufficient for async container removal
under load. Doubled the timeout to 120s and added GinkgoWriter debug
output showing the workload status on each poll iteration, making future
failures much easier to diagnose.
Relates to #4077
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3528a22 to
2b9b4bc
Compare
yrobla
approved these changes
Mar 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
should track workload through create-list-delete lifecycle) has been flaking in CI because the async container removal via Docker can take longer than 60s under load. The test pollslistWorkloads(all=true)waiting for the deleted workload to disappear, but the merge logic inListWorkloadsincludes containers still present in the Docker runtime even after the status file is deleted.GinkgoWriterdebug output on each iteration showing the workload's current status, so future failures will clearly indicate whether the workload is stuck in "removing", or some other unexpected state.Relates to #4077
Type of change
Test plan
task lint-fix)Verified the change compiles with
go vet ./test/e2e/.... The fix targets a CI-only flake in theapi-workloadsE2E test shard — the test logic is unchanged, only the timeout and observability are improved.Does this introduce a user-facing change?
No
Special notes for reviewers
The root cause is a race between async container removal and
ListWorkloads's merge logic, which includes runtime containers even after status files are deleted. A deeper fix would involve either increasingremoveContainer()'s 3s polling timeout or filtering "removing" workloads from the runtime merge. This PR is the quick fix to stop the flake; #4077 tracks broader E2E debuggability improvements.Generated with Claude Code