Skip to content

Increase timeout and add debug logging in lifecycle E2E test#4331

Merged
JAORMX merged 1 commit intomainfrom
fix-flaky-lifecycle-e2e-test
Mar 24, 2026
Merged

Increase timeout and add debug logging in lifecycle E2E test#4331
JAORMX merged 1 commit intomainfrom
fix-flaky-lifecycle-e2e-test

Conversation

@JAORMX
Copy link
Collaborator

@JAORMX JAORMX commented Mar 24, 2026

Summary

  • The workload lifecycle E2E test (should track workload through create-list-delete lifecycle) has been flaking in CI because the async container removal via Docker can take longer than 60s under load. The test polls listWorkloads(all=true) waiting for the deleted workload to disappear, but the merge logic in ListWorkloads includes containers still present in the Docker runtime even after the status file is deleted.
  • Doubled the polling timeout from 60s to 120s and added GinkgoWriter debug output on each iteration showing the workload's current status, so future failures will clearly indicate whether the workload is stuck in "removing", or some other unexpected state.

Relates to #4077

Type of change

  • Bug fix

Test plan

  • Linting (task lint-fix)
  • Manual testing (describe below)

Verified the change compiles with go vet ./test/e2e/.... The fix targets a CI-only flake in the api-workloads E2E test shard — the test logic is unchanged, only the timeout and observability are improved.

Does this introduce a user-facing change?

No

Special notes for reviewers

The root cause is a race between async container removal and ListWorkloads's merge logic, which includes runtime containers even after status files are deleted. A deeper fix would involve either increasing removeContainer()'s 3s polling timeout or filtering "removing" workloads from the runtime merge. This PR is the quick fix to stop the flake; #4077 tracks broader E2E debuggability improvements.

Generated with Claude Code

@github-actions github-actions bot added the size/XS Extra small PR: < 100 lines changed label Mar 24, 2026
@codecov
Copy link

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.91%. Comparing base (df1a5cd) to head (2b9b4bc).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4331      +/-   ##
==========================================
- Coverage   69.08%   68.91%   -0.17%     
==========================================
  Files         478      478              
  Lines       48432    48484      +52     
==========================================
- Hits        33457    33413      -44     
- Misses      12314    12339      +25     
- Partials     2661     2732      +71     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The workload lifecycle E2E test ("create-list-delete") was flaking in CI
because the 60s timeout was insufficient for async container removal
under load. Doubled the timeout to 120s and added GinkgoWriter debug
output showing the workload status on each poll iteration, making future
failures much easier to diagnose.

Relates to #4077

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@JAORMX JAORMX force-pushed the fix-flaky-lifecycle-e2e-test branch from 3528a22 to 2b9b4bc Compare March 24, 2026 12:38
@github-actions github-actions bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Mar 24, 2026
@JAORMX JAORMX merged commit d288822 into main Mar 24, 2026
37 checks passed
@JAORMX JAORMX deleted the fix-flaky-lifecycle-e2e-test branch March 24, 2026 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Extra small PR: < 100 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants