Skip to content

fix: handle nil bucket_id in storage get/update/delete operations#495

Closed
p0rtale wants to merge 1 commit into
masterfrom
fix/storage-bucket-id-nil-check
Closed

fix: handle nil bucket_id in storage get/update/delete operations#495
p0rtale wants to merge 1 commit into
masterfrom
fix/storage-bucket-id-nil-check

Conversation

@p0rtale
Copy link
Copy Markdown

@p0rtale p0rtale commented May 5, 2026

Starting from crud 1.7.0, storage asserts if opts.bucket_id is nil during get, update, or delete operations. Since older routers (< 1.7.0) do not pass bucket_id, this causes protocol incompatibility.

This PR adds a nil check to skip bucket ref when bucket_id is missing, restoring backward compatibility with old routers.

It also introduces a rate-limited warning and the tnt_crud_storage_nil_bucket_id_compat_total metric to track
this state.

  • Tests
  • Changelog
  • Documentation

Closes TNTP-7622

@p0rtale p0rtale self-assigned this May 5, 2026
@p0rtale p0rtale requested review from Satbek, ita-sammann and vakhov May 6, 2026 08:14
Copy link
Copy Markdown
Contributor

@vakhov vakhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I have suggestions here:

  • the workaround itself looks reasonable as a temporary solution for rolling cluster upgrades, but it should be made observable. Right now, when bucket_id == nil, storage silently skips bucket_ref, so during a partial upgrade some traffic goes through a path with weaker guarantees, and operations won’t see it.
  • maybe it’s worth adding a rate-limited warning through a shared helper, for example warn_nil_bucket_id_compat_once(operation, space_name, space.engine), to avoid duplicating the same logic across get/update/delete. The message could be something like: crud.%s_on_storage called without bucket_id; old router compatibility mode is used, bucket_ref is skipped, rebalance safety is reduced until routers are upgraded; space=%s engine=%s.
  • it may also be worth adding a storage-side counter: tnt_crud_storage_nil_bucket_id_compat_total with labels like operation, engine, and possibly safe_mode.
  • also, it may be worth documenting the upgrade workaround explicitly for clusters upgrading from < 1.7.0 to >= 1.7.0. The docs should say that during a rolling upgrade storages should be upgraded first, and routers afterwards. While old routers are still running, storages may enter the compatibility path where bucket_id is missing and bucket_ref is skipped, so should expect the warning/counter above until all routers are upgraded.

@p0rtale p0rtale force-pushed the fix/storage-bucket-id-nil-check branch from bbea306 to 2d0663d Compare May 6, 2026 21:19
@p0rtale p0rtale requested a review from vakhov May 6, 2026 21:23
@p0rtale p0rtale force-pushed the fix/storage-bucket-id-nil-check branch from 2d0663d to 368a39a Compare May 6, 2026 21:31
Starting from crud 1.7.0, storage asserts if `opts.bucket_id`
is nil during `get`, `update`, or `delete` operations.
Since older routers (< 1.7.0) do not pass `bucket_id`, this
causes protocol incompatibility.

This commit adds a nil check to skip bucket ref when `bucket_id`
is missing, restoring backward compatibility with old routers.
It also introduces a rate-limited warning and the
`tnt_crud_storage_nil_bucket_id_compat_total` metric to track
this state.
@p0rtale p0rtale force-pushed the fix/storage-bucket-id-nil-check branch from 368a39a to ef64480 Compare May 6, 2026 21:33
Copy link
Copy Markdown
Contributor

@ita-sammann ita-sammann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bothers me that where we have just added safe mode in 1.7.0 and now we are already punching holes in it. AFAIK 1.7.x router can work with 1.6.x storages. So user can just update routers first and update storages after that. And we don't need to add this check for nil bucket_id.

I suggest that we add instruction how to update cluster to the newer crud versions to README.md:

  • Update storages from pre-1.6.0 (role-based model) to 1.6.1
  • Update routers from pre-1.6.0 to 1.7.4 (safe mode)
  • Update storages from 1.6.1 to 1.7.4

@p0rtale p0rtale closed this May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants