Skip to content

profile-compaction: CollapseConfig CRD + projection overlay + user-ma…#808

Open
entlein wants to merge 5 commits into
kubescape:mainfrom
k8sstormcenter:upstream-pr/sbob-crd-compaction
Open

profile-compaction: CollapseConfig CRD + projection overlay + user-ma…#808
entlein wants to merge 5 commits into
kubescape:mainfrom
k8sstormcenter:upstream-pr/sbob-crd-compaction

Conversation

@entlein
Copy link
Copy Markdown

@entlein entlein commented May 15, 2026

here things got confusing during the rebase, this is the sister PR to storage and need the signature PR (from matthyx)

Summary by CodeRabbit

  • New Features

    • Tamper detection for signed user overlays with R1016 alerts and deduplication
    • Overlay identity stamping for smarter cache invalidation
    • Support for user-defined NetworkNeighborhood via pod label
    • Exec-args tracking and improved wildcard/pattern matching for execs and network addresses
  • Bug Fixes

    • Non-blocking cache notification fan-out to avoid blocking slow subscribers
  • Tests

    • Added tamper, overlay projection, and non-blocking fan-out tests

Review Change Stack

…naged lifecycle

Signed-off-by: entlein <einentlein@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Review limit reached

@entlein, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 59 minutes and 55 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7524cd2e-a5ba-42e4-b0b9-6d299c4e394a

📥 Commits

Reviewing files that changed from the base of the PR and between 8aaf6bf and 649d13d.

📒 Files selected for processing (2)
  • tests/component_test.go
  • tests/resources/curl-signed-deployment.yaml
📝 Walkthrough

Walkthrough

This PR adds tamper-detection and optional exporter wiring for user overlays, stamps overlay identity into container profile SyncChecksum, refactors projection to classify dynamic/wildcard entries and preserve exec-args, extends related data shapes and mocks, and makes rule-binding notifier dispatch non-blocking with tests.

Changes

Container Profile Cache: Tamper Detection, Overlay Identity, and Projection

Layer / File(s) Summary
Overlay identity stamping into SyncChecksum
pkg/objectcache/containerprofilecache/projection.go
Adds imports and stampOverlayIdentity; stamps ap=/nn= suffixes into kubescape.io/sync-checksum idempotently when user overlays are present.
Projection refactoring for dynamic/wildcard classification
pkg/objectcache/containerprofilecache/projection_apply.go
Refactors projectField to accept an isDynamic callback; adds isNetworkIPWildcard and isNetworkDNSWildcard; adds extractExecsByPath; updates egress/ingress extraction to use IPAddresses plural.
Tamper detection state and verification entry points
pkg/objectcache/containerprofilecache/containerprofilecache.go
Adds exporter import and tamperAlertExporter/tamperEmitted fields to ContainerProfileCacheImpl; calls verifyUserApplicationProfile and verifyUserNetworkNeighborhood after loading label-referenced overlays and drops overlays on failed verification.
Verification methods and tamper-alert logic
pkg/objectcache/containerprofilecache/tamper_alert.go
Implements verification for ApplicationProfile and NetworkNeighborhood with tamper-key deduplication; emits R1016 only for ErrSignatureMismatch; clears dedup on successful verification; wires SetTamperAlertExporter and emitTamperAlert.
Data model extensions for ExecsByPath and UserDefinedNetwork
pkg/objectcache/projection_types.go, pkg/objectcache/shared_container_data.go
Adds ExecsByPath map[string][]string to ProjectedContainerProfile; introduces UserDefinedNetworkMetadataKey and WatchedContainerData.UserDefinedNetwork; updates SetContainerInfo to read pod label.
Mock implementation updates for new projection logic
pkg/objectcache/v1/mock.go
Updates mock projection to classify exec paths (dynamic vs literal), populate ExecsByPath, split network egress/ingress into Patterns vs Values, and adds helpers for lazy Values init and wildcard detection.
Tamper verification and alert dedup tests
pkg/objectcache/containerprofilecache/tamper_alert_test.go
Adds tests for tamper vs operational error classification, dedup map behavior, ErrSignatureMismatch sentinel, and exporter-based R1016 emission and dedup semantics.
End-to-end projection and SyncChecksum contract tests
pkg/objectcache/containerprofilecache/test32_projection_test.go
Adds tests ensuring user overlay execs reach projected values, stamping idempotency and baseline preservation, and SyncChecksum reflects add/update/remove overlay transitions.

Rule Binding Cache: Non-Blocking Notification Dispatch

Layer / File(s) Summary
Non-blocking notifier dispatch and context propagation
pkg/rulebindingmanager/cache/cache.go
Snapshots notifier channels under lock, unlocks, then dispatches notifications non-blockingly; updates RefreshRuleBindingsRules to dispatch a coalesced refresh via dispatchNonBlocking.
Non-blocking dispatch and context contract tests
pkg/rulebindingmanager/cache/cache_test.go
Adds tests asserting dispatchNonBlocking drops on full channels without blocking and RefreshRuleBindingsRules completes promptly while notifying non-saturated subscribers.

Miscellaneous

Layer / File(s) Summary
Signature/profile stubs
pkg/signature/stub.go, pkg/signature/profiles/stub.go
Adds build-time stubs: ErrSignatureMismatch sentinel, Signable marker, no-op verify/sign functions, and profile adapter stubs returning nil.
Function cache nil-guard and go.mod replace
pkg/rulemanager/cel/libraries/cache/function_cache.go, go.mod
Adds oc == nil guard in HashForContainerProfile; adds a replace directive in go.mod.

Sequence Diagram(s)

(omitted — changes are multiple focused features; sequence diagrams included in hidden artifact where helpful)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

release

Suggested reviewers

  • matthyx

Poem

🐰 A rabbit hops through overlay code,
Stamping checksums down the road,
Tampering checked, projections neat,
Patterns matched and execs complete—
Alerts now whisper, safe and bold!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is incomplete (truncated with '…') and vague about the primary change, using generic phrasing like 'CollapseConfig CRD + projection overlay + user-ma…' without clarity on the main objective or sufficient context. Complete the truncated title and clarify the primary change (e.g., 'Add tamper detection for user-managed profile overlays' or similar) to make the main objective clear to reviewers scanning history.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/objectcache/containerprofilecache/containerprofilecache.go`:
- Around line 412-414: The verification calls to
c.verifyUserApplicationProfile(userAP, sharedData.Wlid) currently ignore its
boolean result; update both call sites (the one using userAP at lines ~412 and
the similar one around ~428-430) to check the returned bool and, when it is
false and the cache is running in strict verification mode (e.g., the instance
flag controlling strict verification such as c.strictVerify or equivalent),
avoid projecting/merging the overlay (skip the merge/projectOverlay path), log
the verification failure with context (Wlid and which overlay), and return or
surface an error instead of continuing; if not in strict mode, continue but log
a warning. Ensure you reference and use c.verifyUserApplicationProfile(...) and
the strict-mode flag when implementing the conditional.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: acff329b-9e89-4538-a272-7877e0ad1e70

📥 Commits

Reviewing files that changed from the base of the PR and between cc59fa0 and 16a52f4.

📒 Files selected for processing (12)
  • pkg/objectcache/containerprofilecache/containerprofilecache.go
  • pkg/objectcache/containerprofilecache/projection.go
  • pkg/objectcache/containerprofilecache/projection_apply.go
  • pkg/objectcache/containerprofilecache/tamper_alert.go
  • pkg/objectcache/containerprofilecache/tamper_alert_test.go
  • pkg/objectcache/containerprofilecache/test32_projection_test.go
  • pkg/objectcache/projection_types.go
  • pkg/objectcache/shared_container_data.go
  • pkg/objectcache/v1/mock.go
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulebindingmanager/cache/cache_test.go
  • pkg/rulemanager/cel/libraries/cache/function_cache.go

Comment thread pkg/objectcache/containerprofilecache/containerprofilecache.go
entlein pushed a commit to k8sstormcenter/node-agent that referenced this pull request May 16, 2026
…verlay

CodeRabbit upstream PR kubescape#808 / containerprofilecache.go:414 (Major).
The verifyUserApplicationProfile and verifyUserNetworkNeighborhood
methods already return a boolean reflecting verification outcome —
true when the overlay is unsigned OR when verification succeeded OR
in permissive mode (EnableSignatureVerification=false); false only
in strict mode on actual tamper.

The two call sites in projection-load were discarding that return,
so tampered overlays in strict mode silently merged anyway. The
R1016 alert was emitted but the protection was advisory only.

Now: when verify returns false (strict mode + tamper detected) the
overlay is nilled out before the merge step so the cache never
projects a known-tampered profile. Permissive mode is unchanged —
verify always returns true, the overlay still merges, R1016 still
fires.

New tests:

  - TestVerifyAP_StrictMode_ReturnsFalseOnTamper — sign + tamper an
    ApplicationProfile, construct a cache with
    EnableSignatureVerification=true, and assert
    verifyUserApplicationProfile returns false (caller drops overlay).
  - TestVerifyNN_StrictMode_ReturnsFalseOnTamper — symmetric pin for
    the NetworkNeighborhood path.

The existing legacy-permissive tamper test
(TestVerifyAP_TamperedProfile_PopulatesDedupMap) continues to pass
unchanged — that path still returns true with the R1016 emitted.
…c1 final state

Signed-off-by: entlein <einentlein@gmail.com>
@entlein
Copy link
Copy Markdown
Author

entlein commented May 16, 2026

@coderabbitai review

Collective re-review request after applying your atomic-split advisory + rabbit-feedback fixes across the full SBOB upstream PR set. Each PR is now scoped to a single tier per your dependency map:

Tier 0 (opens):

Tier 2 (execs):

Tier 3 (network):

Tier 4 (tamper/sign):

Umbrellas:

Please confirm atomicity per your earlier advisory and flag any remaining cross-PR coupling we missed.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

@entlein — triggering the review now. In parallel, here's my atomicity assessment for PR #808 specifically, based on what's in the AI summary and the tier map you've provided:


PR #808 Atomicity Assessment

Appropriate Tier 4 content ✅

  • tamper_alert.go / tamper_alert_test.go — core tamper detection, SetTamperAlertExporter, R1016 dedup logic
  • containerprofilecache.go — wiring tamperAlertExporter + tamperEmitted
  • projection.gostampOverlayIdentity / projectUserProfiles overlay identity → SyncChecksum
  • shared_container_data.goUserDefinedNetworkMetadataKey / UserDefinedNetwork for NN resource

Cross-tier coupling flags ⚠️

File Issue Belongs to
projection_types.go Defines ExecsByPath map[string][]string on ProjectedContainerProfile Tier 2 · #812 ("ExecsByPath composite-key surface")
projection_apply.go Consumes ExecsByPath, containsDynamicSegment, isNetworkIPWildcard, isNetworkDNSWildcard Tier 2 (#812) for exec surface; Tier 3 (#806/#324) for network wildcards
v1/mock.go Updated to populate ExecsByPath + wildcard network patterns Tier 2 + Tier 3
rulebindingmanager/cache/cache.go + test Non-blocking fanout refactor — purely infrastructure, no tamper/sign semantics Pre-Tier 4 (Tier 0 or standalone)

Specific recommendations

  1. ExecsByPath type definition should either live in #812 (Tier 2) with #808 taking a compile dependency on it, or you should explicitly note that #808 is the defining PR and #812 merely consumes it. Right now the tier map implies the reverse, which is confusing.

  2. isNetworkIPWildcard / isNetworkDNSWildcard in projection_apply.go duplicate classifier logic that Tier 3 (#806, #324) is supposed to own. Either import from the Tier 3 package or note these are intentionally local shims with a // TODO: consolidate with pkg/objectcache/networkmatch comment.

  3. Non-blocking fanout (rulebindingmanager/cache) is fully orthogonal to tamper/sign. If this was included because it was required to prevent deadlocks when emitting tamper alerts under load, that's a valid reason — but it should be called out explicitly in the PR description as a required infrastructure prerequisite, not just bundled silently.


Bottom line: The tamper/sign core is clean. The main atomicity leaks are the ExecsByPath surface definition crossing from Tier 4 into what the map says is Tier 2, and the fanout refactor having no declared relationship to the Tier 4 scope. If you can clarify the ExecsByPath ownership direction and add a sentence to the PR description explaining why the fanout refactor is here, the remaining coupling is defensible.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

entlein added a commit to k8sstormcenter/node-agent that referenced this pull request May 16, 2026
…e#807, kubescape#808) (#44)

* fix(containerprofilecache): honor verify result before merging user overlay

CodeRabbit upstream PR kubescape#808 / containerprofilecache.go:414 (Major).
The verifyUserApplicationProfile and verifyUserNetworkNeighborhood
methods already return a boolean reflecting verification outcome —
true when the overlay is unsigned OR when verification succeeded OR
in permissive mode (EnableSignatureVerification=false); false only
in strict mode on actual tamper.

The two call sites in projection-load were discarding that return,
so tampered overlays in strict mode silently merged anyway. The
R1016 alert was emitted but the protection was advisory only.

Now: when verify returns false (strict mode + tamper detected) the
overlay is nilled out before the merge step so the cache never
projects a known-tampered profile. Permissive mode is unchanged —
verify always returns true, the overlay still merges, R1016 still
fires.

New tests:

  - TestVerifyAP_StrictMode_ReturnsFalseOnTamper — sign + tamper an
    ApplicationProfile, construct a cache with
    EnableSignatureVerification=true, and assert
    verifyUserApplicationProfile returns false (caller drops overlay).
  - TestVerifyNN_StrictMode_ReturnsFalseOnTamper — symmetric pin for
    the NetworkNeighborhood path.

The existing legacy-permissive tamper test
(TestVerifyAP_TamperedProfile_PopulatesDedupMap) continues to pass
unchanged — that path still returns true with the R1016 emitted.

* docs(applicationprofile): rule-author contracts on opens-with-suffix/prefix + exec-args

Two doc-only fixes for CodeRabbit cross-PR advisory:

#7 (NA kubescape#807): ap.was_path_opened_with_suffix / _prefix — explicitly
document the false-negative gap when the projection is in pass-through
mode (cp.Opens.All=true). Wildcard Patterns are skipped from the
suffix/prefix scan because their token-bearing text doesn't safely
answer suffix questions. Rule authors who need wildcard-aware coverage
should either declare an Opens-projection slice (so SuffixHits/PrefixHits
become authoritative for the literals they care about) or use
ap.was_path_opened (which runs CompareDynamic over Patterns).

#8 (NA kubescape#807): wasExecutedWithArgs — document the three states of
ExecsByPath:
  1. Path absent from Execs.Values → exec not allowed, fall through.
  2. Path in Values, ABSENT from ExecsByPath → legacy back-compat
     "no argv constraint", match.
  3. Path in Values, PRESENT with empty arg list [] → explicit
     "ran with no args" constraint, NOT a wildcard.
The distinction is load-bearing for profile authors: an entry of
{Path: ..., Args: []} is a constraint, not a free pass.

No behavioural change. Tests pass unchanged.

* deps: pin stereoscope v0.1.9 + runtime-spec v1.2.1 (compat with kubescape/syft fork)

Storage rc1 bumped to syft v1.42.4 (CVE-2026-33481), which transitively
requires stereoscope v0.1.22 + runtime-spec v1.3.0. Those versions use
the new moby/moby/client submodule API, which is incompatible with
inspektor-gadget's moby/moby umbrella requirement on the node-agent
side (ambiguous-import wall — see issue #45).

Node-agent stays on kubescape/syft v1.32.0-ks.2 via the existing
replace, but transitive resolution from storage's go.mod pulls the
newer stereoscope into the build, breaking the build with:

  undefined: client.New
  undefined: client.PingOptions

Adds two replace directives to force the older transitive chain that
matches kubescape/syft v1.32.0-ks.2's expectations:

  github.com/anchore/stereoscope     => v0.1.9-0.20250826202322-...
  github.com/opencontainers/runtime-spec => v1.2.1

This is the minimum set needed for node-agent to build cleanly against
storage rc1 (with syft v1.42.4) while still using kubescape/syft on
its own side. Storage's CVE fix remains in effect at the storage
binary; node-agent's syft surface is unchanged.

Verified locally:
  go build ./...          ok
  go test ./pkg/objectcache/... ./pkg/rulemanager/... -count=1
                          → 30+ packages ok

---------

Co-authored-by: Entlein <eineintlein@gmail.com>
@matthyx matthyx moved this to WIP in KS PRs tracking May 26, 2026
Copy link
Copy Markdown
Contributor

@matthyx matthyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found three merge-blocking build issues and left inline notes on each failing spot.

"github.com/kubescape/node-agent/pkg/exporters"
"github.com/kubescape/node-agent/pkg/rulemanager/types"
"github.com/kubescape/node-agent/pkg/signature"
"github.com/kubescape/node-agent/pkg/signature/profiles"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go test ./... fails on this new file because github.com/kubescape/node-agent/pkg/signature and .../pkg/signature/profiles do not exist in the checked-in module (pkg/**/signature/** is empty here). As written, this breaks package setup for every importer of containerprofilecache, so the PR will not build until these imports point at a real package or the code is wired through an existing dependency.

}
// v0.0.2 IPAddresses[] — list form supporting CIDRs and '*' sentinel.
// Same semantics as the deprecated singular IPAddress, just plural.
addrs = append(addrs, n.IPAddresses...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current github.com/kubescape/storage/pkg/apis/softwarecomposition/v1beta1.NetworkNeighbor type only exposes IPAddress (singular) — go doc ...NetworkNeighbor has no IPAddresses field. That makes this append, and the matching ingress append below, a compile error against the current dependency, so this needs to stay on the singular field unless the storage API is updated first.

pcp.EgressAddresses.All = true
pcp.EgressAddresses.Values = make(map[string]struct{})
if !specInstalled || spec.EgressAddresses.InUse {
addrs := make([]string, 0, len(n.IPAddresses)+1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same blocker in the mock path: NetworkNeighbor does not have an IPAddresses field, so len(n.IPAddresses) and the append here fail to compile (same issue repeats in the analogous ingress block below). go test ./pkg/objectcache/v1 is currently red because of this.

Three matthyx blockers (2026-05-27):

(1) tamper_alert.go:28 — imports pkg/signature and pkg/signature/profiles
which ship in node-agent#809 (not yet merged). Adds minimum-surface
stubs (pkg/signature/stub.go, pkg/signature/profiles/stub.go) so the
PR compiles standalone. With IsSigned returning false, the tamper path
short-circuits and never invokes Verify — the no-op stub is behavior-
safe: signed-profile detection is dormant until kubescape#809 lands and replaces
these files with the real implementation.

(2) projection_apply.go:265 — NetworkNeighbor.IPAddresses missing.
(3) mock.go:202 — same IPAddresses issue.

Both (2) and (3) resolved by replacing kubescape/storage with
k8sstormcenter/storage's upstream-pr/sbob-network tip 46f37d32 (sibling
of kubescape/storage#324). Removed when kubescape#324 merges and a release ships
IPAddresses.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/rulebindingmanager/cache/cache.go (2)

188-189: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't multiplex refresh pulses onto the pod-delta notifier.

rulebindingmanager.RuleBindingNotify{} is a real zero-value event here (Action == Added with an empty Pod). pkg/containerwatcher/v2/container_watcher_collection.go:144-173 treats every notification as a concrete pod update, so this pulse will mark rule bindings initialized and add "/" to ruleManagedPods. If pkg/rulemanager/rule_manager.go:112-133 needs a coalesced refresh signal, give it a separate notifier path or extend the payload with an explicit refresh action.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulebindingmanager/cache/cache.go` around lines 188 - 189, The refresh
pulse is being sent as an empty rulebindingmanager.RuleBindingNotify via
dispatchNonBlocking(notifiers, ...), which the container watcher treats as a
real Pod Added event and corrupts ruleManagedPods; stop multiplexing refreshes
onto the pod-delta notifier by adding a dedicated refresh path or explicit
action field: either add a new notifier channel specifically for coalesced
refreshes and call dispatchNonBlocking(refreshNotifiers, nil, "refresh pulse"),
or extend rulebindingmanager.RuleBindingNotify with an explicit Action
enum/value (e.g., Refresh) and emit RuleBindingNotify{Action: Refresh} from the
caller and update the consumers in container_watcher_collection and
rule_manager.go to ignore Refresh as a pod update and handle it as a coalesced
refresh signal.

201-219: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Drop-on-full is unsafe for the multi-pod add/modify/delete paths.

This helper is fine for idempotent pulses, but the handlers pass slices of per-pod notifications. Once a subscriber channel fills, later pod events are silently lost, and pkg/containerwatcher/v2/container_watcher_collection.go:144-173 does not resync from cache — it updates ruleManagedPods from each message. A single large rule-binding change can therefore leave a slow subscriber permanently missing pods. Consider coalescing those handlers to one resync event or moving delivery behind a per-subscriber queue/worker that preserves at-least-once delivery.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulebindingmanager/cache/cache.go` around lines 201 - 219, The current
dispatchNonBlocking drops per-pod notifications which breaks callers that expect
at-least-once delivery (the add/modify/delete handlers that send slices of
per-pod RuleBindingNotify and consumers that update ruleManagedPods). Fix by
changing delivery semantics: either (A) coalesce the handlers so they emit one
consolidated resync-style RuleBindingNotify instead of many per-pod messages
(update the add/modify/delete handlers to produce a single resync message), or
(B) implement a per-subscriber worker/queue: create a goroutine per notifier
that owns a buffered channel and serializes delivery (modify dispatchNonBlocking
to enqueue into each notifier's dedicated queue/channel and have the worker
perform the actual send with retries/blocking to guarantee at-least-once), and
update indexOfNotifier usage accordingly; pick one approach and adjust all
callers of dispatchNonBlocking and the add/modify/delete handlers to match.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/rulebindingmanager/cache/cache.go`:
- Line 140: The AddHandler and ModifyHandler implementations hold c.mutex and
call addRuleBinding/modifiedRuleBinding while those helpers use
context.Background() for Namespaces().List and Pods(...).List which prevents
cancellation; change the signatures of addRuleBinding(ctx context.Context, ...)
and modifiedRuleBinding(ctx context.Context, ...) and pass the incoming watch
context from AddHandler/ModifyHandler into those calls, then replace
context.Background() uses in both helpers with the passed ctx for both
Namespaces().List and Pods(...).List; finally update cache_test.go to call the
new function signatures accordingly.

---

Outside diff comments:
In `@pkg/rulebindingmanager/cache/cache.go`:
- Around line 188-189: The refresh pulse is being sent as an empty
rulebindingmanager.RuleBindingNotify via dispatchNonBlocking(notifiers, ...),
which the container watcher treats as a real Pod Added event and corrupts
ruleManagedPods; stop multiplexing refreshes onto the pod-delta notifier by
adding a dedicated refresh path or explicit action field: either add a new
notifier channel specifically for coalesced refreshes and call
dispatchNonBlocking(refreshNotifiers, nil, "refresh pulse"), or extend
rulebindingmanager.RuleBindingNotify with an explicit Action enum/value (e.g.,
Refresh) and emit RuleBindingNotify{Action: Refresh} from the caller and update
the consumers in container_watcher_collection and rule_manager.go to ignore
Refresh as a pod update and handle it as a coalesced refresh signal.
- Around line 201-219: The current dispatchNonBlocking drops per-pod
notifications which breaks callers that expect at-least-once delivery (the
add/modify/delete handlers that send slices of per-pod RuleBindingNotify and
consumers that update ruleManagedPods). Fix by changing delivery semantics:
either (A) coalesce the handlers so they emit one consolidated resync-style
RuleBindingNotify instead of many per-pod messages (update the add/modify/delete
handlers to produce a single resync message), or (B) implement a per-subscriber
worker/queue: create a goroutine per notifier that owns a buffered channel and
serializes delivery (modify dispatchNonBlocking to enqueue into each notifier's
dedicated queue/channel and have the worker perform the actual send with
retries/blocking to guarantee at-least-once), and update indexOfNotifier usage
accordingly; pick one approach and adjust all callers of dispatchNonBlocking and
the add/modify/delete handlers to match.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b640f7e-c5dd-4b9f-a2b1-436aa9cc7564

📥 Commits

Reviewing files that changed from the base of the PR and between 16a52f4 and 8aaf6bf.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • go.mod
  • pkg/objectcache/containerprofilecache/containerprofilecache.go
  • pkg/rulebindingmanager/cache/cache.go
  • pkg/rulebindingmanager/cache/cache_test.go
  • pkg/signature/profiles/stub.go
  • pkg/signature/stub.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/objectcache/containerprofilecache/containerprofilecache.go

c.mutex.Unlock()
return
}
rbs = c.addRuleBinding(ruleBinding)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show repo file
ls -la

# Print the relevant portions of the file around the referenced lines
FILE="pkg/rulebindingmanager/cache/cache.go"

echo "=== File stats ==="
wc -l "$FILE"
echo

echo "=== Snippet around line 120-180 ==="
nl -ba "$FILE" | sed -n '110,190p'
echo

echo "=== Snippet around line 220-310 ==="
nl -ba "$FILE" | sed -n '220,310p'
echo

echo "=== Snippet around line 320-380 ==="
nl -ba "$FILE" | sed -n '320,380p'
echo

echo "=== Find all context.Background() uses in this file ==="
rg -n "context\.Background\(\)" "$FILE" || true
echo

echo "=== Find definitions/usages of addRuleBinding / modifiedRuleBinding ==="
rg -n "func \(c \*RBCache\) (addRuleBinding|modifiedRuleBinding)\b" "$FILE" || true
rg -n "\.addRuleBinding\(" "$FILE" || true
rg -n "\.modifiedRuleBinding\(" "$FILE" || true
echo

echo "=== Show where List() calls happen in addRuleBinding/modifiedRuleBinding ==="
# Print whole functions if not too large
perl -ne 'print if $.>=0' "$FILE" >/dev/null
# Use ast-grep? Fallback to sed by locating function start lines
ADD_START=$(nl -ba "$FILE" | rg -n "func \(c \*RBCache\) addRuleBinding" | head -n1 | cut -d: -f1 || true)
MOD_START=$(nl -ba "$FILE" | rg -n "func \(c \*RBCache\) modifiedRuleBinding" | head -n1 | cut -d: -f1 || true)

if [ -n "${ADD_START:-}" ]; then
  echo "=== addRuleBinding from $ADD_START (print until next func) ==="
  nl -ba "$FILE" | sed -n "${ADD_START},$((ADD_START+160))p"
fi
if [ -n "${MOD_START:-}" ]; then
  echo "=== modifiedRuleBinding from $MOD_START (print until next func) ==="
  nl -ba "$FILE" | sed -n "${MOD_START},$((MOD_START+160))p"
fi

Repository: kubescape/node-agent

Length of output: 2017


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="pkg/rulebindingmanager/cache/cache.go"

echo "=== File stats ==="
wc -l "$FILE"
echo

echo "=== Snippet around line 110-190 ==="
awk 'NR>=110 && NR<=190 {printf "%d\t%s\n", NR, $0}' "$FILE"
echo

echo "=== Snippet around line 220-310 ==="
awk 'NR>=220 && NR<=310 {printf "%d\t%s\n", NR, $0}' "$FILE"
echo

echo "=== Snippet around line 320-380 ==="
awk 'NR>=320 && NR<=380 {printf "%d\t%s\n", NR, $0}' "$FILE"
echo

echo "=== Find all context.Background() uses in this file ==="
rg -n "context\.Background\(\)" "$FILE" || true
echo

echo "=== Find definitions/usages of addRuleBinding / modifiedRuleBinding ==="
rg -n "func \(c \*RBCache\) (addRuleBinding|modifiedRuleBinding)\b" "$FILE" || true
rg -n "\.addRuleBinding\(" "$FILE" || true
rg -n "\.modifiedRuleBinding\(" "$FILE" || true
echo

echo "=== Function bodies (trimmed) ==="
ADD_START=$(rg -n "func \(c \*RBCache\) addRuleBinding\b" "$FILE" | head -n1 | cut -d: -f1 || true)
MOD_START=$(rg -n "func \(c \*RBCache\) modifiedRuleBinding\b" "$FILE" | head -n1 | cut -d: -f1 || true)

if [ -n "${ADD_START:-}" ]; then
  echo "--- addRuleBinding starting at $ADD_START ---"
  awk -v s="$ADD_START" 'NR>=s && NR<=s+220 {printf "%d\t%s\n", NR, $0}' "$FILE"
fi

if [ -n "${MOD_START:-}" ]; then
  echo "--- modifiedRuleBinding starting at $MOD_START ---"
  awk -v s="$MOD_START" 'NR>=s && NR<=s+220 {printf "%d\t%s\n", NR, $0}' "$FILE"
fi

Repository: kubescape/node-agent

Length of output: 24559


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "\.addRuleBinding\(" -S . || true
rg -n "\.modifiedRuleBinding\(" -S . || true
rg -n "func \(c \*RBCache\) addRuleBinding\(" pkg/rulebindingmanager/cache/cache.go
rg -n "func \(c \*RBCache\) modifiedRuleBinding\(" pkg/rulebindingmanager/cache/cache.go

Repository: kubescape/node-agent

Length of output: 739


Restore watch-context cancellation for RB list calls.
pkg/rulebindingmanager/cache/cache.go keeps c.mutex held in AddHandler/ModifyHandler while it calls addRuleBinding/modifiedRuleBinding, and addRuleBinding currently uses context.Background() for Namespaces().List and Pods(...).List (lines 140, 159, 237-281, 343-345). This prevents list requests from being canceled when the watch context is canceled and can stall cache operations during/after shutdown or restart. Thread ctx through addRuleBinding(ctx, ...) / modifiedRuleBinding(ctx, ...), pass it from AddHandler/ModifyHandler, and use it for both List calls; update cache_test.go call sites for the signature change.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/rulebindingmanager/cache/cache.go` at line 140, The AddHandler and
ModifyHandler implementations hold c.mutex and call
addRuleBinding/modifiedRuleBinding while those helpers use context.Background()
for Namespaces().List and Pods(...).List which prevents cancellation; change the
signatures of addRuleBinding(ctx context.Context, ...) and
modifiedRuleBinding(ctx context.Context, ...) and pass the incoming watch
context from AddHandler/ModifyHandler into those calls, then replace
context.Background() uses in both helpers with the passed ctx for both
Namespaces().List and Pods(...).List; finally update cache_test.go to call the
new function signatures accordingly.

helpers.Error(err))
// Honour strict-mode: refuse to load on any verification failure,
// but do NOT touch the dedup map or emit R1016.
return !c.cfg.EnableSignatureVerification
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c.cfg.EnableSignatureVerification still does not exist on config.Config, so this file has four hard compiler errors at the strict-mode return sites (for example here, and again at lines 108/136/146). Please either add/plumb that config field in this PR or remove the strict-mode branches so the tamper path matches currently available config.

Comment thread pkg/signature/stub.go
// signature-payload accessors.
type Signable interface{}

func IsSigned(_ Signable) bool { return false }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These stubs still leave the new tamper-detection path effectively disabled: IsSigned always returns false, VerifyObjectAllowUntrusted always returns nil, and the adapters in pkg/signature/profiles/stub.go return nil. That means the runtime path never verifies anything and the new tamper tests are not exercising the intended behavior. This needs the real signature implementation (or the feature/tests need to stay out of this PR).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those are just to make the PR "atomic" they need to be deleted after merge. I havent yet found a way to completely seperate the PRs 808 and 809

Copy link
Copy Markdown
Contributor

@matthyx matthyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-checked this PR and I’m still seeing three merge blockers:

  • pkg/objectcache/containerprofilecache/tamper_alert.go references c.cfg.EnableSignatureVerification, but pkg/config/config.go still has no such field. LSP flags four compiler errors at the strict-mode returns.
  • cmd/main.go creates both the runtime alert exporter and the container profile cache, but never calls cpc.SetTamperAlertExporter(exporter). As written, emitTamperAlert() is a no-op in production, so the new R1016 path never actually emits alerts end-to-end.
  • pkg/signature/stub.go / pkg/signature/profiles/stub.go still short-circuit the feature (IsSigned == false, VerifyObjectAllowUntrusted == nil, adapters return nil), so the new tamper-detection behavior is effectively dormant and the added tamper tests are not validating real verification behavior.

@entlein
Copy link
Copy Markdown
Author

entlein commented May 28, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@matthyx matthyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked again.

The earlier import/IPAddresses blockers are addressed now: the PR has local pkg/signature stubs, and the storage replace brings NetworkNeighbor.IPAddresses.

I’m still seeing three blockers before merge:

  • pkg/objectcache/containerprofilecache/tamper_alert.go still reads c.cfg.EnableSignatureVerification at the strict-mode return sites, but pkg/config/config.go still has no such field on config.Config.
  • cmd/main.go creates both the runtime alert exporter and the container profile cache, but never calls cpc.SetTamperAlertExporter(exporter), so the new R1016 tamper path still never emits alerts end-to-end.
  • pkg/signature/stub.go and pkg/signature/profiles/stub.go still short-circuit the feature (IsSigned == false, VerifyObjectAllowUntrusted == nil, adapters return nil), so the tamper-detection path remains dormant and the added tamper tests are not validating a real verification flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: WIP

Development

Successfully merging this pull request may close these issues.

2 participants