Skip to content

[CELEBORN-2321] Avoid locking disk writers during memory split checks#3680

Closed
sunchao wants to merge 1 commit intoapache:mainfrom
sunchao:dev/chao/codex/celeborn-fast-memory-split-check-oss-main
Closed

[CELEBORN-2321] Avoid locking disk writers during memory split checks#3680
sunchao wants to merge 1 commit intoapache:mainfrom
sunchao:dev/chao/codex/celeborn-fast-memory-split-check-oss-main

Conversation

@sunchao
Copy link
Copy Markdown
Member

@sunchao sunchao commented May 8, 2026

Why are the changes needed?

needHardSplitForMemoryShuffleStorage() runs on the push path. Disk-backed writers can never require this memory-only split check, but the method currently acquires the writer lock before returning false. For the common disk-backed case, that adds avoidable contention with writes and evictions on a hot path.

What changes were proposed in this PR?

This PR adds an unlocked fast path for non-memory writers so they return immediately without taking the PartitionDataWriter monitor. For memory-backed writers, it rechecks currentTierWriter after entering the synchronized block before evaluating the existing hard-split conditions, which preserves the original behavior if the writer tier changes concurrently.

Does this PR resolve a correctness bug?

No.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Attempted build/mvn -pl worker -am -DskipTests compile on current main.
  • The Maven reactor fails before reaching worker because celeborn-master_2.12 cannot resolve snapshot test-jar artifacts for celeborn-common_2.12 and celeborn-service_2.12; that failure is unrelated to this change.

(cherry picked from commit 1630aab0b59f3b5c7adf47e85062e4d28a5f1662)
@sunchao sunchao changed the title Avoid locking disk writers for memory split checks Avoid locking disk writers during memory split checks May 8, 2026
@sunchao sunchao changed the title Avoid locking disk writers during memory split checks [CELEBORN-2321] Avoid locking disk writers during memory split checks May 8, 2026
@SteNicholas SteNicholas requested a review from Copilot May 9, 2026 02:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces lock contention on the worker push-data hot path by adding an unlocked fast-path in PartitionDataWriter.needHardSplitForMemoryShuffleStorage() for non-memory (disk/DFS) writers, while preserving the original semantics for memory writers via a synchronized re-check.

Changes:

  • Remove method-level synchronization from needHardSplitForMemoryShuffleStorage() and add a lock-free early return for non-MemoryTierWriter cases.
  • For MemoryTierWriter, re-read currentTierWriter after entering synchronized (this) to ensure correctness if the tier changes concurrently.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@afterincomparableyum afterincomparableyum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SteNicholas
Copy link
Copy Markdown
Member

Thanks for fix. Merged to main(v0.7.0).

@sunchao
Copy link
Copy Markdown
Member Author

sunchao commented May 10, 2026

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants