Skip to content

fix(cache): release RBCache mutex lock before sending notifications#828

Open
matthyx wants to merge 1 commit into
mainfrom
fix/cache-deadlock-oom
Open

fix(cache): release RBCache mutex lock before sending notifications#828
matthyx wants to merge 1 commit into
mainfrom
fix/cache-deadlock-oom

Conversation

@matthyx
Copy link
Copy Markdown
Contributor

@matthyx matthyx commented May 29, 2026

Releases the RBCache mutex lock before sending rule-binding/pod notifications to channels in AddHandler, ModifyHandler, and DeleteHandler. This resolves the RBCache deadlock which blocks all event processing workers and results in unbounded memory growth (OOM).

Summary by CodeRabbit

  • Refactor
    • Improved notification delivery in the rule-binding manager cache through asynchronous queue-based processing, reducing lock contention and enhancing system concurrency performance.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@matthyx, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 43 minutes and 25 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6c12ee93-786f-4ccc-9158-2039219dc610

📥 Commits

Reviewing files that changed from the base of the PR and between 7ab1afb and 308553b.

📒 Files selected for processing (2)
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulebindingmanager/cache/mock.go
📝 Walkthrough

Walkthrough

RBCache refactors notification delivery from synchronous sends (blocking handler execution) to asynchronous queuing. Handlers now snapshot notifiers and enqueue notifications non-blockingly; a background goroutine processes the queue and dispatches events. NewCache and NewCacheMock both initialize the queue and start the processor.

Changes

Asynchronous notification queue architecture

Layer / File(s) Summary
Notification queue infrastructure and startup
pkg/rulebindingmanager/cache/cache.go
pendingNotification type captures notifier channel snapshots and events. RBCache gains notificationQueue buffered channel. processNotifications goroutine drains the queue and sends events to notifiers. NewCache starts the goroutine.
Handler notification dispatch refactoring
pkg/rulebindingmanager/cache/cache.go
AddHandler, ModifyHandler, DeleteHandler, and RefreshRuleBindingsRules snapshot notifiers under lock and enqueue notifications via non-blocking select with drop+log on full queue, replacing direct channel sends.
Mock cache queue initialization and goroutine startup
pkg/rulebindingmanager/cache/mock.go
NewCacheMock initializes notificationQueue field and starts the background processNotifications goroutine before returning the cache.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A queue for every notified soul,
While handlers slip beyond the lock,
A goroutine keeps watch and control,
Dispatching events 'round the clock,
No more delays—just smooth daylight ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: moving notification sends outside the RBCache mutex to prevent deadlock.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cache-deadlock-oom

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@matthyx matthyx force-pushed the fix/cache-deadlock-oom branch from db32b73 to b2d10c7 Compare May 29, 2026 17:04
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.225 0.222 -1.5%
Peak CPU (cores) 0.231 0.229 -1.0%
Avg Memory (MiB) 314.719 274.001 -12.9%
Peak Memory (MiB) 316.672 279.449 -11.8%
Dedup Effectiveness

No data available.

@matthyx matthyx force-pushed the fix/cache-deadlock-oom branch from b2d10c7 to 7ab1afb Compare May 30, 2026 11:13
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/rulebindingmanager/cache/cache.go`:
- Around line 74-82: The processNotifications loop in RBCache (function
processNotifications reading from notificationQueue and iterating
pn.notifiers/pn.events) uses a blocking send (*n <- event) which can stall the
whole goroutine if any notifier channel is full or slow; change the send to a
non-blocking pattern (use a select with the channel send case and a default case
to drop-and-log, or use a select with a time.After timeout to drop+log after a
short wait) so a single bad notifier cannot block processing of
notificationQueue and other notifiers; ensure you log dropped events with
context (which notifier and event) for observability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dacc15b2-3805-4898-a6e8-3219af2c0903

📥 Commits

Reviewing files that changed from the base of the PR and between eb5d48e and 7ab1afb.

📒 Files selected for processing (2)
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulebindingmanager/cache/mock.go

Comment thread pkg/rulebindingmanager/cache/cache.go
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
@matthyx matthyx force-pushed the fix/cache-deadlock-oom branch from 7ab1afb to 308553b Compare May 30, 2026 11:30
@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.000 0.000 N/A
Peak CPU (cores) 0.000 0.000 N/A
Avg Memory (MiB) 0.000 0.000 N/A
Peak Memory (MiB) 0.000 0.000 N/A
Dedup Effectiveness

No data available.

@github-actions
Copy link
Copy Markdown

Performance Benchmark Results

Node-Agent Resource Usage
Metric BEFORE AFTER Delta
Avg CPU (cores) 0.195 0.196 +0.2%
Peak CPU (cores) 0.205 0.206 +0.3%
Avg Memory (MiB) 347.599 265.818 -23.5%
Peak Memory (MiB) 349.766 271.496 -22.4%
Dedup Effectiveness

No data available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant