Skip to content

GCP: Route GCS batch deletes per credential prefix#16499

Open
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:issue/16480-gcs-mixed-prefix-batch-deletes
Open

GCP: Route GCS batch deletes per credential prefix#16499
wombatu-kun wants to merge 1 commit into
apache:mainfrom
wombatu-kun:issue/16480-gcs-mixed-prefix-batch-deletes

Conversation

@wombatu-kun
Copy link
Copy Markdown
Contributor

Summary

Closes #16480

GCSFileIO.internalDeleteFiles partitioned BlobIds into fixed-size batches and then selected the GCS Storage client once per batch from only the first object's path. When a single GCSFileIO is configured with multiple per-prefix StorageCredentials (vended-credentials flow), a batch that crossed prefix boundaries was issued in full through whichever client matched the first object — sending the rest of the batch through the wrong credentials. The fix groups BlobIds by their PrefixedStorage client (via the existing longest-prefix-match clientForStoragePath helper) before partitioning into batches, and uses each client's own deleteBatchSize. No public API or exception contract changes.

This mirrors how S3FileIO.deleteFiles already groups by bucket before batching.

Tests

Two new unit tests in TestGCSFileIO:

  • deleteFilesRoutesToCorrectClientPerPrefix — interleaves objects across two credential-prefixed buckets and asserts that each per-prefix Storage client receives only its own BlobIds.
  • deleteFilesBatchesPerClient — sets a small gcs.delete.batch-size and asserts that batches stay per-client and never mix BlobIds from two prefixes.

Confirmed locally with ./gradlew :iceberg-gcp:test --tests "org.apache.iceberg.gcp.gcs.TestGCSFileIO" (26 tests, 0 failures). The new tests also fail when run against the previous code, confirming they catch the bug.

@github-actions github-actions Bot added the GCP label May 21, 2026
@wombatu-kun wombatu-kun force-pushed the issue/16480-gcs-mixed-prefix-batch-deletes branch from fec4f8b to 93daf9f Compare May 21, 2026 03:58
@wombatu-kun
Copy link
Copy Markdown
Contributor Author

PR isn't the cause of Kafka Connect CI failure. I've fixed Kafka Connect integration test flakiness in separate PR #16438

Closes apache#16480

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wombatu-kun wombatu-kun force-pushed the issue/16480-gcs-mixed-prefix-batch-deletes branch from 93daf9f to 99bf949 Compare May 22, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mixed-prefix GCS batch deletes reuse the first object's client for the whole batch

1 participant