Conversation
📝 WalkthroughWalkthroughThis PR implements comprehensive logging metrics instrumentation, refactors logging configuration to support customizable levels, updates dependencies (OpenTelemetry, PostgreSQL, Helm charts), and improves scheduler robustness by adjusting error handling in filter plugins. Additionally, several Helm resources are updated with configuration changes and alert rules. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Test Coverage ReportTest Coverage 📊: 68.8% |
| - alert: CortexNovaSyncObjectsDroppedToZero | ||
| expr: cortex_sync_objects{service="cortex-nova-metrics", datasource!="openstack_migrations"} == 0 | ||
| expr: cortex_sync_objects{service="cortex-nova-metrics", datasource!~"openstack_migrations|prometheus_kvm_libvirt_domain_steal_pct"} == 0 |
There was a problem hiding this comment.
Silence the alert instead?
There was a problem hiding this comment.
Maybe also an option. The issue in general is that not all datasource (e.g. prometheus_kvm_libvirt_domain_steal_pct) does not always have FEATURES > 0
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
helm/library/cortex-postgres/Chart.yaml (1)
1-9:⚠️ Potential issue | 🟡 MinorAdd the required local-source comment in this Chart.yaml.
This file is missing the mandated
# from: file://../../library/cortex-postgrescomment.📝 Suggested fix
# Copyright SAP SE # SPDX-License-Identifier: Apache-2.0 +# from: file://../../library/cortex-postgres apiVersion: v2 name: cortex-postgresAs per coding guidelines,
Helm Chart.yaml files must include the # from: file://../../library/cortex-postgres comment pointing to the local chart path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helm/library/cortex-postgres/Chart.yaml` around lines 1 - 9, Add the mandated local-source comment to the Chart.yaml for the cortex-postgres chart: open Chart.yaml (name: cortex-postgres) and insert the exact comment line "# from: file://../../library/cortex-postgres" near the top (above or immediately below the license/header comments) so the file includes the required local chart path reference.internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go (1)
265-296:⚠️ Potential issue | 🟡 MinorInconsistent log level between CPU and memory capacity checks.
The CPU capacity check at line 266 was downgraded to
Warn, but the analogous memory capacity check at line 292 still usesError. Both conditions have identical control flow (skip the host viacontinue), so they should likely use the same log level for consistency.🔧 Proposed fix for consistency
freeMemory, ok := free["memory"] if !ok || freeMemory.Value() < 0 { - traceLog.Error( + traceLog.Warn( "host with invalid memory capacity", "host", host, "freeMemory", freeMemory.String(), ) continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go` around lines 265 - 296, The memory-capacity check uses traceLog.Error while the analogous CPU check uses traceLog.Warn; change the traceLog.Error call in the memory validation block (the branch that checks freeMemory, referencing freeMemory and free["memory"]) to traceLog.Warn so both invalid-capacity branches use the same log level (traceLog.Warn) and remain consistent with the CPU check that uses traceLog.Warn for host skipping via continue.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@helm/library/cortex/templates/manager/manager.yaml`:
- Around line 38-43: The template currently unconditionally emits
--zap-log-level/--zap-devel and LOG_LEVEL which can duplicate user-provided
entries in .Values.controllerManager.container.args and
.Values.controllerManager.container.env; add guards so we only render the
generated flags when the user has not already provided them. Implement small
helper(s) (e.g. a template function containsArg that iterates
.Values.controllerManager.container.args to check for a matching prefix like
"--zap-log-level" or "--zap-devel", and a helper hasEnvVar that scans
.Values.controllerManager.container.env for name == "LOG_LEVEL"), then wrap the
blocks that emit the flags/ENV (the places using
.Values.controllerManager.container.logLevel,
.Values.controllerManager.container.zapDevel and the LOG_LEVEL emission) with
conditions that only render when the corresponding containsArg/hasEnvVar returns
false; apply the same guard to both the blocks at the shown location and the
similar block at the other referenced section (lines 66-75).
In `@helm/library/cortex/values.yaml`:
- Around line 19-22: Update the zapDevel comment to avoid implying it toggles
log level on its own: clarify that zapDevel only changes format/stacktrace
behavior (human-readable console logs and development stack traces) but the
effective log level is controlled by controllerManager.container.logLevel and
the injected flag --zap-log-level in manager.yaml; mention the default logLevel:
"info" so users know enabling zapDevel does not automatically enable debug
logging.
In `@pkg/monitoring/log_metrics_test.go`:
- Around line 215-218: The test currently checks for single-letter bytes in
output which can match timestamps or level names; update the assertion in
pkg/monitoring/log_metrics_test.go to assert on a stable field instead: either
parse the handler output into individual log records and assert each record's
msg field equals "d"/"i"/"w"/"e", or change the substring checks to look for
"msg=d", "msg=i", "msg=w", "msg=e" in the output string (referencing the
existing output variable and the inner handler used in the test) so the loop
verifies delegation using a stable field rather than raw single characters.
- Around line 337-339: The test creates a zapcore.BufferedWriteSyncer (sink)
which spawns a background goroutine that is never stopped; replace the
BufferedWriteSyncer usage in the subtests that call WrapCoreWithLogMetrics with
a plain synchronous writer (e.g., use zapcore.AddSync(&bytes.Buffer{}) for the
WS) or, if you must keep BufferedWriteSyncer, ensure you call sink.Stop() at the
end of each subtest to clean up; locate the test block creating sink, inner
(zapcore.NewCore), and wrapped (WrapCoreWithLogMetrics) and change the sink
construction or add explicit Stop() to avoid leaking goroutines.
In `@postgres/Dockerfile`:
- Line 2: The image currently runs as root (FROM debian:trixie-slim...), so add
a non-root user and switch to it at the end of the Dockerfile: create a
user/group (e.g., appuser), set ownership (chown) of any runtime directories the
container needs, drop privileges with USER appuser, and ensure any commands that
require root (apt installs, etc.) run earlier as root; reference the
Dockerfile's FROM line to locate where to append the user creation, chown, and
USER directives.
---
Outside diff comments:
In `@helm/library/cortex-postgres/Chart.yaml`:
- Around line 1-9: Add the mandated local-source comment to the Chart.yaml for
the cortex-postgres chart: open Chart.yaml (name: cortex-postgres) and insert
the exact comment line "# from: file://../../library/cortex-postgres" near the
top (above or immediately below the license/header comments) so the file
includes the required local chart path reference.
In `@internal/scheduling/nova/plugins/filters/filter_has_enough_capacity.go`:
- Around line 265-296: The memory-capacity check uses traceLog.Error while the
analogous CPU check uses traceLog.Warn; change the traceLog.Error call in the
memory validation block (the branch that checks freeMemory, referencing
freeMemory and free["memory"]) to traceLog.Warn so both invalid-capacity
branches use the same log level (traceLog.Warn) and remain consistent with the
CPU check that uses traceLog.Warn for host skipping via continue.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f9653713-4bbb-423c-a641-d17fbfe96b23
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (22)
.github/renovate.jsoncmd/main.gocortex.secrets.example.yamlgo.modhelm/bundles/cortex-cinder/Chart.yamlhelm/bundles/cortex-manila/Chart.yamlhelm/bundles/cortex-nova/Chart.yamlhelm/bundles/cortex-nova/alerts/nova.alerts.yamlhelm/bundles/cortex-nova/templates/knowledges_kvm.yamlhelm/library/cortex-postgres/Chart.yamlhelm/library/cortex/templates/manager/manager.yamlhelm/library/cortex/values.yamlinternal/knowledge/extractor/controller.gointernal/scheduling/lib/filter_weigher_pipeline.gointernal/scheduling/nova/external_scheduler_api.gointernal/scheduling/nova/hypervisor_overcommit_controller.gointernal/scheduling/nova/plugins/filters/filter_capabilities.gointernal/scheduling/nova/plugins/filters/filter_external_customer.gointernal/scheduling/nova/plugins/filters/filter_has_enough_capacity.gopkg/monitoring/log_metrics.gopkg/monitoring/log_metrics_test.gopostgres/Dockerfile
| {{- if .Values.controllerManager.container.logLevel }} | ||
| - "--zap-log-level={{ .Values.controllerManager.container.logLevel }}" | ||
| {{- end }} | ||
| {{- if .Values.controllerManager.container.zapDevel }} | ||
| - "--zap-devel" | ||
| {{- end }} |
There was a problem hiding this comment.
Avoid rendering duplicate log config through both the typed knobs and the free-form overrides.
controllerManager.container.args and controllerManager.container.env are already user-extensible. If an existing deployment still sets --zap-log-level, --zap-devel, or LOG_LEVEL there, these blocks will render duplicates and make precedence depend on ordering. That is an upgrade footgun. Guard the generated entries when the manual form is already present, or require a migration away from the generic overrides first.
Also applies to: 66-75
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@helm/library/cortex/templates/manager/manager.yaml` around lines 38 - 43, The
template currently unconditionally emits --zap-log-level/--zap-devel and
LOG_LEVEL which can duplicate user-provided entries in
.Values.controllerManager.container.args and
.Values.controllerManager.container.env; add guards so we only render the
generated flags when the user has not already provided them. Implement small
helper(s) (e.g. a template function containsArg that iterates
.Values.controllerManager.container.args to check for a matching prefix like
"--zap-log-level" or "--zap-devel", and a helper hasEnvVar that scans
.Values.controllerManager.container.env for name == "LOG_LEVEL"), then wrap the
blocks that emit the flags/ENV (the places using
.Values.controllerManager.container.logLevel,
.Values.controllerManager.container.zapDevel and the LOG_LEVEL emission) with
conditions that only render when the corresponding containsArg/hasEnvVar returns
false; apply the same guard to both the blocks at the shown location and the
similar block at the other referenced section (lines 66-75).
| # Enable zap development mode (human-readable console logs, debug level). | ||
| # Set to true for local development (e.g. Tilt), keep false for production | ||
| # (JSON logs, info level, no stack traces on warnings). | ||
| zapDevel: false |
There was a problem hiding this comment.
zapDevel is documented more strongly than it behaves now.
manager.yaml also injects --zap-log-level={{ .Values.controllerManager.container.logLevel }} by default, so enabling zapDevel no longer implies debug logging on its own. With the default logLevel: "info", users will still get info-level output. Please update this comment to avoid a misleading chart contract.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@helm/library/cortex/values.yaml` around lines 19 - 22, Update the zapDevel
comment to avoid implying it toggles log level on its own: clarify that zapDevel
only changes format/stacktrace behavior (human-readable console logs and
development stack traces) but the effective log level is controlled by
controllerManager.container.logLevel and the injected flag --zap-log-level in
manager.yaml; mention the default logLevel: "info" so users know enabling
zapDevel does not automatically enable debug logging.
| for _, msg := range []string{"d", "i", "w", "e"} { | ||
| if !bytes.Contains([]byte(output), []byte(msg)) { | ||
| t.Errorf("expected inner handler to receive message %q", msg) | ||
| } |
There was a problem hiding this comment.
Tighten the delegation assertion.
Line 216 only checks whether "d", "i", "w", or "e" appears anywhere in the full buffer, so this test can pass on unrelated bytes in timestamps or level names. Assert on a stable field like msg=d/msg=i/… or parse the handler output per record instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/monitoring/log_metrics_test.go` around lines 215 - 218, The test
currently checks for single-letter bytes in output which can match timestamps or
level names; update the assertion in pkg/monitoring/log_metrics_test.go to
assert on a stable field instead: either parse the handler output into
individual log records and assert each record's msg field equals
"d"/"i"/"w"/"e", or change the substring checks to look for "msg=d", "msg=i",
"msg=w", "msg=e" in the output string (referencing the existing output variable
and the inner handler used in the test) so the loop verifies delegation using a
stable field rather than raw single characters.
| sink := &zapcore.BufferedWriteSyncer{WS: zapcore.AddSync(&bytes.Buffer{})} | ||
| inner := zapcore.NewCore(enc, sink, zapcore.DebugLevel) | ||
| wrapped := WrapCoreWithLogMetrics(inner) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In go.uber.org/zap v1.27.1, does zapcore.BufferedWriteSyncer start a background goroutine or require Stop() to avoid leaks in tests?
💡 Result:
Yes—zapcore.BufferedWriteSyncer can start a background goroutine, and you should call Stop() (typically defer bws.Stop()) to avoid goroutine leaks in tests once it has been initialized.
- The goroutine is started in
initialize()viago s.flushLoop(), andinitialize()is triggered on first use (e.g., the firstWrite). [1] Stop()stops the ticker, closes the stop channel, waits for the flush loop to exit, and then flushes remaining data. [1]- The official docs explicitly say to “defer a Stop() call” and that
Stop“cleans up background goroutines.” [2]
If you never write to it (so it never initializes), it won’t start the goroutine—but in typical logger usage it will, so in tests you should Stop().
Sources: [1] [2]
🏁 Script executed:
cd pkg/monitoring && wc -l log_metrics_test.goRepository: cobaltcore-dev/cortex
Length of output: 88
🏁 Script executed:
cd pkg/monitoring && sed -n '320,360p' log_metrics_test.goRepository: cobaltcore-dev/cortex
Length of output: 1256
🏁 Script executed:
cd pkg/monitoring && sed -n '337,373p' log_metrics_test.goRepository: cobaltcore-dev/cortex
Length of output: 1278
Drop BufferedWriteSyncer here or call Stop() explicitly.
Each subtest allocates a zapcore.BufferedWriteSyncer and writes to it via ce.Write(), but nothing calls Stop(). This starts a background goroutine that is never cleaned up, leaking resources across the test suite. Use plain zapcore.AddSync(&bytes.Buffer{}) instead, which avoids the background goroutine entirely.
♻️ Proposed fix
- sink := &zapcore.BufferedWriteSyncer{WS: zapcore.AddSync(&bytes.Buffer{})}
+ sink := zapcore.AddSync(&bytes.Buffer{})
inner := zapcore.NewCore(enc, sink, zapcore.DebugLevel)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sink := &zapcore.BufferedWriteSyncer{WS: zapcore.AddSync(&bytes.Buffer{})} | |
| inner := zapcore.NewCore(enc, sink, zapcore.DebugLevel) | |
| wrapped := WrapCoreWithLogMetrics(inner) | |
| sink := zapcore.AddSync(&bytes.Buffer{}) | |
| inner := zapcore.NewCore(enc, sink, zapcore.DebugLevel) | |
| wrapped := WrapCoreWithLogMetrics(inner) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/monitoring/log_metrics_test.go` around lines 337 - 339, The test creates
a zapcore.BufferedWriteSyncer (sink) which spawns a background goroutine that is
never stopped; replace the BufferedWriteSyncer usage in the subtests that call
WrapCoreWithLogMetrics with a plain synchronous writer (e.g., use
zapcore.AddSync(&bytes.Buffer{}) for the WS) or, if you must keep
BufferedWriteSyncer, ensure you call sink.Stop() at the end of each subtest to
clean up; locate the test block creating sink, inner (zapcore.NewCore), and
wrapped (WrapCoreWithLogMetrics) and change the sink construction or add
explicit Stop() to avoid leaking goroutines.
| # Last updated: 17 Mar 2026 | ||
| FROM debian:trixie-slim@sha256:26f98ccd92fd0a44d6928ce8ff8f4921b4d2f535bfa07555ee5d18f61429cf0c | ||
| # Last updated: 9 Apr 2026 | ||
| FROM debian:trixie-slim@sha256:4ffb3a1511099754cddc70eb1b12e50ffdb67619aa0ab6c13fcd800a78ef7c7a |
There was a problem hiding this comment.
Run the final image as a non-root user.
No USER is set after Line 2, so the image defaults to root at runtime. This weakens container isolation and matches the DS-0002 finding.
🔧 Proposed hardening change
ENTRYPOINT ["docker-entrypoint.sh"]
+USER postgres
@@
CMD ["postgres"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@postgres/Dockerfile` at line 2, The image currently runs as root (FROM
debian:trixie-slim...), so add a non-root user and switch to it at the end of
the Dockerfile: create a user/group (e.g., appuser), set ownership (chown) of
any runtime directories the container needs, drop privileges with USER appuser,
and ensure any commands that require root (apt installs, etc.) run earlier as
root; reference the Dockerfile's FROM line to locate where to append the user
creation, chown, and USER directives.
Uh oh!
There was an error while loading. Please reload this page.