Skip to content

[Backport 3.6.x] fix: prevent silent volume migration in ControllerPublishVolume#144

Merged
mweibel merged 1 commit intorelease/3.6from
fix/multiattach-3.6
Mar 27, 2026
Merged

[Backport 3.6.x] fix: prevent silent volume migration in ControllerPublishVolume#144
mweibel merged 1 commit intorelease/3.6from
fix/multiattach-3.6

Conversation

@mweibel
Copy link
Collaborator

@mweibel mweibel commented Mar 27, 2026

ControllerPublishVolume previously overwrote ServerUUIDs unconditionally, silently moving an attached volume to a new node. When the subsequent ControllerUnpublishVolume for the old node found the volume no longer there, it returned success without detaching, leaving a stale VolumeAttachment that caused hours-long Multi-Attach deadlocks in production.

Fix: fetch the volume before attaching. If it is attached to a different node return FailedPrecondition so the external-attacher waits for the old detach to complete first. If already attached to the requested node return idempotent success.

Also add volume locks (TryAcquire/Release, already used on the node side) to DeleteVolume, ControllerPublishVolume, ControllerUnpublishVolume, and ControllerExpandVolume to prevent TOCTOU races between concurrent mutating operations on the same volume.

Backport of f7ffaa4 to release/3.6.x.
See #143

@mweibel mweibel force-pushed the fix/multiattach-3.6 branch from 2c000be to 300ee25 Compare March 27, 2026 10:25
ControllerPublishVolume previously overwrote ServerUUIDs unconditionally,
silently moving an attached volume to a new node. When the subsequent
ControllerUnpublishVolume for the old node found the volume no longer
there, it returned success without detaching, leaving a stale
VolumeAttachment that caused hours-long Multi-Attach deadlocks in
production.

Fix: fetch the volume before attaching. If it is attached to a different
node return FailedPrecondition so the external-attacher waits for the old
detach to complete first. If already attached to the requested node return
idempotent success.

Also add volume locks (TryAcquire/Release, already used on the node side)
to DeleteVolume, ControllerPublishVolume, ControllerUnpublishVolume, and
ControllerExpandVolume to prevent TOCTOU races between concurrent mutating
operations on the same volume.

Backport of f7ffaa4 to release/3.6.x.
@mweibel mweibel force-pushed the fix/multiattach-3.6 branch from 300ee25 to 415e013 Compare March 27, 2026 11:04
@mweibel mweibel merged commit 9e90f14 into release/3.6 Mar 27, 2026
1 check passed
@mweibel mweibel deleted the fix/multiattach-3.6 branch March 27, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant