Integrate deployment metadata service for locking and state#4856
Open
shreyas-goenka wants to merge 28 commits into
Open
Integrate deployment metadata service for locking and state#4856shreyas-goenka wants to merge 28 commits into
shreyas-goenka wants to merge 28 commits into
Conversation
Collaborator
|
Commit: 06061ae |
shreyas-goenka
commented
Apr 8, 2026
|
|
||
| // Report skip actions to the metadata service. On initial registration, | ||
| // these are recorded as INITIAL_REGISTER operations. | ||
| if action == deployplan.Skip && b.OperationReporter != nil { |
Contributor
Author
There was a problem hiding this comment.
move the initial registration up
shreyas-goenka
commented
Apr 10, 2026
| @@ -0,0 +1,6 @@ | |||
| Local = true | |||
| Cloud = false | |||
Contributor
Author
There was a problem hiding this comment.
The service needs to roll out to prod before we enable this on cloud.
4bbbe9c to
7b26260
Compare
shreyas-goenka
commented
Apr 14, 2026
| assert.True(t, ok) | ||
| assert.Equal(t, tmpdms.VersionTypeDestroy, vt) | ||
|
|
||
| _, ok = goalToVersionType(GoalBind) |
Contributor
Author
There was a problem hiding this comment.
support can be added as a followup.
79b930a to
62719b0
Compare
Keep resources.json maintained alongside the DMS deployment so users have a backward path if they hit issues with the DMS-backed flow. Move DMS-specific bookkeeping (the deployment_id that ties the bundle to a server-side deployment record) into a sibling managed_service.json so the two concerns stay cleanly separated.
cbdb0f0 to
b25b325
Compare
A single async sender goroutine drains a buffered channel of operation events; CRUD workers push onto the channel and continue. When the buffer fills (capacity matches the worker pool), workers block on the send and naturally back off — this is the only intended source of backpressure on the worker pool. Reporting is best-effort: a DMS API failure is logged and the sender keeps draining. The deploy is no longer aborted when the audit-log write fails. On a hard process crash, at most ~10 buffered events can be lost (channel capacity). Release() drains the reporter before completing the version so the audit trail is as complete as possible on a clean shutdown.
b25b325 to
a0b74f7
Compare
This was referenced May 22, 2026
Closed
…5406) ## Why DMS-backed bundle deployments (run with `DATABRICKS_BUNDLE_MANAGED_STATE=true DATABRICKS_BUNDLE_ENGINE=direct`) never set `display_name` when creating the deployment record, so the field is stored as `null`. ## What Populate `DisplayName` from `bundle.Config.Bundle.Name` (i.e. the `bundle.name` from `databricks.yml`) when issuing `CreateDeployment`. This matches the human-readable label users already see in `databricks bundle validate`. ## Tests Existing `acceptance/bundle/dms/*` tests record the `CreateDeployment` request body via `print_requests.py`; their `output.txt` files regenerate to assert the new `display_name` field. This pull request and its description were written by Isaac.
The deployment metadata service now accepts git provenance on a version (origin_url, branch, commit) per databricks-eng/universe#2009991. Record it on CreateVersion using the same values the CLI writes to metadata.json.
# Conflicts: # bundle/deploy/lock/acquire.go # bundle/statemgmt/state_push.go # cmd/bundle/utils/process.go # libs/testserver/fake_workspace.go # libs/testserver/server.go
… for determinism Main's direct engine applies resources concurrently, so the order of recorded CreateOperation requests varied between runs. Add --sort to print_requests.py in the multi-resource DMS tests to make the recorded output deterministic.
Merging main changed several APIs the DMS code predates: - WorkspaceClient now takes a ctx (workspace_filesystem.go). - StateDB keeps a separate resource-key->ID index (stateIDs) that is authoritative during writes; Data.State is only reconstructed when the WAL is merged. LoadStateFromDMS wrote Data.State directly, leaving the index empty, so deletes failed with "missing in state". It now builds the database and calls OpenWithData, which populates the index. - The inline operation reporter read the freshly-created resource ID and state from Data.State (stale during a deploy). It now reads the ID from GetResourceID and the state from the value just applied, so operations carry the real resource_id and state and the server round-trips them.
The SDK's JobDeployment/PipelineDeployment now carry deployment_id and version_id (used to look up deployment metadata in the DMS). Stamp them onto each job and pipeline so every resource records the deployment and the version that produced it. The IDs are only known after the deployment lock is acquired, so a new deploy-phase mutator (AnnotateDeploymentVersion) sets them, running after the lock and before the plan. The version is plumbed onto the bundle alongside the deployment ID. version_id changes on every deploy, so an ignore_local_changes rule keeps it from triggering an update on its own; a real update still sends the current version_id via the full-config Reset/EditPipeline. (Also adjusts isAborted to errors.AsType for the Go 1.26 linter pulled in by the merge.)
…ion_id Operations now carry the resource_id and full state (including the deployment block with deployment_id/version_id), and the out.test.toml dump format changed on main. sequential-deploys now shows the version_id rule working: deploy 2 bumps the version but the unchanged test_job records no operation.
## Changes Set `display_name` on the DMS deployment version, using the bundle name — the same value already recorded on the deployment. The `Version` proto has a `display_name` field, but the `CreateVersion` request never populated it, so every version came back with a null `display_name` even though the deployment had one. This stamps it for parity. ## Why `display_name` is set on the deployment (from the bundle name) but was missing on each version, leaving version records without a human-readable label. Filling it in keeps deployment and version metadata consistent. ## Tests Updated the `bundle/dms` acceptance outputs and confirmed they pass. This pull request and its description were written by Isaac, an AI coding agent.
## Changes
Record the bundle target deployment mode on each DMS version. Adds a
`deployment_mode` field (and the `DEPLOYMENT_MODE_DEVELOPMENT` /
`DEPLOYMENT_MODE_PRODUCTION` enum) to `tmpdms.Version`, and sets it in
the `CreateVersion` request from `bundle.mode`.
Not set on the deployment: `Deployment.deployment_mode` is derived
server-side from the most recent version's mode (output-only), so the
CLI only sets it on the version. A target with no `mode` maps to an
empty value, which is omitted (the server treats it as unspecified) — we
don't fabricate a default.
## Why
The SDK's `bundle.Version` already carries `deployment_mode` ("captured
at the time of this version"), but the CLI never populated it, so every
version recorded a null mode. This stamps it so each version records
whether it was a development or production deployment.
## Tests
Added a unit test for the mode mapping (development / production /
unset). The `bundle/dms` acceptance outputs are unchanged because those
targets don't set a mode. Verified live against a workspace: a `mode:
development` target now records `deployment_mode:
DEPLOYMENT_MODE_DEVELOPMENT` on the created version.
This pull request and its description were written by Isaac, an AI
coding agent.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Integrates the Deployment Metadata Service (DMS) as an alternative backend for deployment locking and resource state management. Gated behind
DATABRICKS_BUNDLE_MANAGED_STATE=true.When enabled:
ListResources/CreateOperation) instead of local state filesgit_info(origin_url,branch,commit) on the deployment version — the same values the CLI writes tometadata.json. Server support added in databricks-eng/universe#2009991.Key implementation details
DeploymentLockinterface (lock.go) with two implementations:workspaceFilesystemLock(existing behavior) andmetadataServiceLock(DMS)resolveDeploymentIDreads deployment ID from workspaceresources.json, or generates a new UUID for fresh deployments (written only afterCreateDeploymentsucceeds)LoadStateFromDMSpopulates the state DB fromListResourcesinstead of reading local filesPushResourcesStateis a no-op with DMS (state is persisted per-operation to the server)--planflag and bind/unbind are not supported with DMSTest plan
acceptance/bundle/dms/covering: deploy with resource creation, sequential deploys with create/delete, plan + summary, deploy errors, and lock release errorsplanActionToOperationActionmappinggit_inforound-trips through the DMS service:Update: provenance + main merge
main(SDK v0.141.0, Go 1.26 toolchain). Reconciled the lock-package refactor and fixed the DMS state round-trip against main's WAL-basedStateDB:LoadStateFromDMSnow usesOpenWithData(populates the resource-key→ID index), and the inline operation reporter reads the resource ID viaGetResourceIDand the state from the just-applied value, so operations carry the realresource_idand state.git_info(origin_url,branch,commit) on the deployment version — same provenance asmetadata.json(server support: databricks-eng/universe#2009991).deployment_id/version_idon each job and pipeline'sdeploymentblock. A deploy-phase mutator (AnnotateDeploymentVersion) stamps them after the lock is acquired.version_idchanges every deploy, so anignore_local_changesrule keeps it from triggering an update on its own; real updates still send the current value via the full-config Reset/Edit.Verified end-to-end on e2-dogfood (git_info + deployment_id/version_id round-trip) and in
acceptance/bundle/dms(sequential-deploysshows an unchanged job is skipped when only itsversion_idbumps).