Skip to content

WIP merge wait methods in#5391

Draft
denik wants to merge 13 commits into
mainfrom
denik/wait-method-removal
Draft

WIP merge wait methods in#5391
denik wants to merge 13 commits into
mainfrom
denik/wait-method-removal

Conversation

@denik
Copy link
Copy Markdown
Contributor

@denik denik commented Jun 1, 2026

Changes

Why

Tests

@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:13 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 08:13 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Jun 1, 2026

Commit: bf602f1

Run: 26943963669

@denik denik force-pushed the denik/wait-method-removal branch from b00a6f3 to 786f578 Compare June 1, 2026 13:05
@denik denik temporarily deployed to test-trigger-is June 1, 2026 13:06 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 13:06 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 15:14 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 15:14 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 15:14 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 1, 2026 15:14 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 13:32 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 13:32 — with GitHub Actions Inactive
@denik denik force-pushed the denik/wait-method-removal branch from f0e65f7 to 9f7fc23 Compare June 2, 2026 19:10
@denik denik temporarily deployed to test-trigger-is June 2, 2026 19:11 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 19:11 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:05 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:05 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:21 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:21 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:41 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 2, 2026 20:41 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 08:52 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 08:52 — with GitHub Actions Inactive
@denik denik force-pushed the denik/wait-method-removal branch from bb3c2e3 to cdd95c8 Compare June 3, 2026 15:15
@denik denik temporarily deployed to test-trigger-is June 3, 2026 15:16 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 15:16 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 15:35 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 3, 2026 15:35 — with GitHub Actions Inactive
denik added 2 commits June 4, 2026 11:41
…for intermediate state saves

Resources with multi-step deployments (apps, clusters, model serving endpoints, etc.)
previously split their logic between DoCreate/DoUpdate and WaitAfterCreate/WaitAfterUpdate
at an arbitrary point, making it hard to persist state incrementally.

Replace the WaitAfterXxx methods with an *Engine parameter on DoCreate and DoUpdate.
Engine.SetID + Engine.SaveState can be called immediately after the initial API call
succeeds, before any long-running wait, so the resource is tracked in state even if
deployment is interrupted mid-wait (preventing orphaned resources).

Simple resources pass _ *Engine and are unaffected. Complex resources (apps, clusters,
database instances, model serving endpoints, vector search endpoints and indexes) now
inline their wait logic and call engine.SetID/SaveState at the appropriate point.

Co-authored-by: Denis Bilenko
Make testserver app DELETE asynchronous (sets DELETING state) to match
real API behaviour. AppsGet auto-removes the app after returning it in
DELETING state, so the next request sees it as gone.

app_test.go: use real SDK calls (Create + DeleteByName) to put the app
in DELETING state before testing DoCreate retry logic, instead of
injecting state directly. RetriesWhenGetReturnsNotFound uses a one-shot
POST override so GET returns 404 naturally without a custom GET handler.

all_test.go: retry DoRead once after DoDelete to let async deletions
(DELETING → gone) clear before asserting the resource is absent.

Add Server.GetWorkspace to testserver for pre-seeding state in tests.

Co-authored-by: Denis Bilenko
denik added 11 commits June 4, 2026 11:41
Co-authored-by: Denis Bilenko
…State

Collapse the two-step engine.SetID(id) / engine.SaveState(state) pattern into a
single engine.SaveState(id, state) call. The Engine still tracks the id internally
and panics if a subsequent call passes a different id (guards against bugs).

Also extract the polling loop from vector_search_index.DoCreate into a private
waitForIndexReady helper so that the unchanged DoDelete stays between the two
functions and does not appear in the diff.

Co-authored-by: Denis Bilenko
…nternally

Resource implementations cannot recover from a state-persistence failure —
the resource already exists on the server and aborting would not undo its
creation. Log the error via logdiag instead of propagating it, and drop the
error return so call sites are a single statement.

Co-authored-by: Denis Bilenko
All six resources with a gap between resource creation and a subsequent
long-running wait now call engine.SaveState immediately after the create
API returns, using waiter.Name() (available before Wait()) for the postgres
resources and createResp.DashboardId for the dashboard.

This ensures an interrupted deployment leaves a tracked resource rather
than an orphan the next plan must rediscover from remote state.

Co-authored-by: Denis Bilenko
dashboard.DoCreate now calls engine.SaveState immediately after the
dashboard is created (with etag persisted), before publishDashboard.
A failed publish leaves the draft tracked in state rather than
orphaned; the next deploy finds it via DoRead and re-publishes via
DoUpdate without recreating the dashboard.

The old trash-on-publish-failure cleanup is removed — it was a
fragile workaround for the lack of state persistence and is now
unnecessary.

Acceptance tests:
- publish-failure-cleans-up-dashboard: updated to reflect the new
  behavior per engine (direct: draft persists with URL; terraform:
  existing behavior, cleaned up). Output files split to per-engine
  variants (out.summary.*.txt, out.dashboardrequests.*.txt).
- publish-failure-retry (new, direct only): verifies end-to-end that
  a transient publish failure leaves the draft in state (summary shows
  URL, plan detects diff), and the subsequent deploy re-publishes
  without issuing a CREATE call.

Co-authored-by: Denis Bilenko
…cements

The local testserver uses ?o= and cloud uses ?w= for the workspace/org ID
in dashboard published URLs. Add a parent-level [[Repls]] rule that maps
both to ?[WSPARAM]= so the output files are environment-independent.

Co-authored-by: Denis Bilenko
Use per-test [[Repls]] rules (matching raw digits) in the two
publish-failure tests so the URL parameter is normalized to
?[WSPARAM]=[NUMID] regardless of whether the testserver or cloud
environment is used. Revert the over-broad parent-level rule that
broke detect-change's existing ?[ow]=... cleanup.

Co-authored-by: Denis Bilenko
macOS ships bash 3.2 which does not support &>> (bash 4+).
Replace with >> file 2>&1 which works on all bash versions.

Co-authored-by: Denis Bilenko
MSYS_NO_PATHCONV=1 (set in the parent test.toml) prevents MSYS2 from
converting POSIX paths to Windows paths, causing Python to receive a
broken path (/c/a/... instead of C:\a\...) when invoking fault.py.
Unset it before the fault.py call, matching the pattern used by other
dashboard scripts before their bin helper invocations.

Co-authored-by: Denis Bilenko
…d=false

Engine.SaveState:
- Accepts resourceKey for logging: "SaveState: resources.X id=Y N bytes: {...}"
- Skips WAL write if state is unchanged (structdiff.IsEqual), logging a skip message.
- Records the last saved value in e.lastSaved for subsequent comparisons.

Dashboard DoCreate:
- Saves intermediate state with Published=false (the actual draft state) instead of
  the user's Published=true. This ensures the planner sees a real diff (false→true)
  on the next deploy if publish is interrupted, rather than treating the resource as
  up-to-date and silently skipping the publish.

publish-failure-retry acceptance test:
- Adds `bundle plan -o json` and extracts the published change to verify
  old=false, new=true in the plan after a failed publish.

README.md: update stale SetID+SaveState reference to current SaveState(ctx, id, state) API.

Co-authored-by: Denis Bilenko
…reate/DoUpdate

Mergiraf reintroduced the standalone WaitAfterCreate and WaitAfterUpdate
methods during the rebase against main (sql_warehouse.go had a merge conflict).
Inline their wait logic directly into DoCreate and DoUpdate and remove
the standalone methods, consistent with the Engine callback pattern used
by other resources.

Co-authored-by: Denis Bilenko
@denik denik force-pushed the denik/wait-method-removal branch from df19a64 to bf602f1 Compare June 4, 2026 09:41
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:42 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:42 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants