Skip to content

[CELEBORN-1577][FOLLOWUP] Fix backward compatiblity issue with interrupt shuffle#3675

Open
s0nskar wants to merge 1 commit intoapache:mainfrom
s0nskar:fix_interrupt
Open

[CELEBORN-1577][FOLLOWUP] Fix backward compatiblity issue with interrupt shuffle#3675
s0nskar wants to merge 1 commit intoapache:mainfrom
s0nskar:fix_interrupt

Conversation

@s0nskar
Copy link
Copy Markdown
Contributor

@s0nskar s0nskar commented May 6, 2026

What changes were proposed in this pull request?

Fix backward compatibility issue with interrupt shuffle by checking if the reason is nonEmpty.

Why are the changes needed?

If someone uses a new client with old server which is not sending CheckQuotaResponse in HeartbeatFromApplicationResponse then proto uses default value to build CheckQuotaResponse with isAvailable=false and reason="". In this case the job will always fails without breaching the quota, we should not fail the job if the reason is empty to make it backward compatible.

Does this PR resolve a correctness bug?

No

Does this PR introduce any user-facing change?

No

How was this patch tested?

Tested in local setup.

@RexXiong
Copy link
Copy Markdown
Contributor

RexXiong commented May 7, 2026

Using reason.nonEmpty as a proxy for "server supports quota check" is fragile — reason is a business description field, not a compatibility signal. If a legitimate quota-exceeded response happens to have an empty reason, it would be silently ignored; conversely, any future change to the reason format could break this logic.

Since quotaInterruptShuffleEnabled defaults to false, users must explicitly opt in. In a new-client/old-server scenario, users should simply not enable this config. This is an explicit and semantically clear approach, rather than relying on an implicit convention around the reason field.

If we do need robust backward compatibility, a better approach would be to add an explicit capability field (e.g., quotaCheckSupported) to the heartbeat response. But that's a larger change and can be addressed separately.

Reviewed by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants