Skip to content

OCPBUGS-58102: Expose ExternalDNS operand metrics via kube-rbac-proxy sidecar#371

Open
Thealisyed wants to merge 5 commits into
openshift:mainfrom
Thealisyed:fix/operand-metrics-exposure
Open

OCPBUGS-58102: Expose ExternalDNS operand metrics via kube-rbac-proxy sidecar#371
Thealisyed wants to merge 5 commits into
openshift:mainfrom
Thealisyed:fix/operand-metrics-exposure

Conversation

@Thealisyed
Copy link
Copy Markdown
Contributor

@Thealisyed Thealisyed commented Mar 24, 2026

The ExternalDNS operand binds its Prometheus metrics to localhost (127.0.0.1:7979), making them invisible to the cluster monitoring stack. This adds a kube-rbac-proxy sidecar to proxy metrics over HTTPS, along with a Service and ServiceMonitor for Prometheus discovery.

  • Add --kube-rbac-proxy-image CLI flag and operator config plumbing
  • Inject one kube-rbac-proxy sidecar per zone container (ports 8443+) so multi-zone deployments expose all metrics
  • Create metrics Service with serving-cert annotation for auto TLS
  • Create ServiceMonitor with scheme: https, bearerTokenFile, and tlsConfig for authenticated Prometheus scraping
  • Add tokenreviews/subjectaccessreviews RBAC for kube-rbac-proxy auth
  • Add Service watch to controller for reconciliation on drift

Assisted-by: Claude

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 24, 2026
@Thealisyed Thealisyed force-pushed the fix/operand-metrics-exposure branch 3 times, most recently from 0297cde to 3f8cd03 Compare March 25, 2026 10:54
@Thealisyed Thealisyed changed the title [WIP] Expose ExternalDNS operand metrics via kube-rbac-proxy sidecar Expose ExternalDNS operand metrics via kube-rbac-proxy sidecar Mar 26, 2026
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 26, 2026
@Thealisyed Thealisyed changed the title Expose ExternalDNS operand metrics via kube-rbac-proxy sidecar OCPBUGS-58102: Expose ExternalDNS operand metrics via kube-rbac-proxy sidecar Mar 26, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Mar 26, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@Thealisyed: This pull request references Jira Issue OCPBUGS-58102, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

The ExternalDNS operand binds its Prometheus metrics to localhost (127.0.0.1:7979), making them invisible to the cluster monitoring stack. This adds a kube-rbac-proxy sidecar to proxy metrics over HTTPS, along with a Service and ServiceMonitor for Prometheus discovery.

  • Add --kube-rbac-proxy-image CLI flag and operator config plumbing
  • Inject one kube-rbac-proxy sidecar per zone container (ports 8443+) so multi-zone deployments expose all metrics
  • Create metrics Service with serving-cert annotation for auto TLS
  • Create ServiceMonitor with scheme: https, bearerTokenFile, and tlsConfig for authenticated Prometheus scraping
  • Add tokenreviews/subjectaccessreviews RBAC for kube-rbac-proxy auth
  • Add Service watch to controller for reconciliation on drift

Assisted-by: Claude

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Mar 26, 2026
@alebedev87
Copy link
Copy Markdown
Contributor

/assign

@Thealisyed
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 1, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@Thealisyed: This pull request references Jira Issue OCPBUGS-58102, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @melvinjoseph86

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from melvinjoseph86 April 1, 2026 15:56
Copy link
Copy Markdown
Contributor

@alebedev87 alebedev87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First look. I need to have another one with a focus on the service/servicemonitor comparison logic.

Comment thread bundle/manifests/external-dns-operator.clusterserviceversion.yaml
Comment thread config/rbac/operand_role.yaml
Comment thread main.go Outdated
Comment thread pkg/operator/controller/externaldns/controller.go
Comment thread pkg/operator/controller/externaldns/controller.go
Comment thread pkg/operator/controller/externaldns/controller.go Outdated
Comment thread pkg/operator/controller/externaldns/pod.go Outdated
Comment thread pkg/operator/controller/externaldns/service.go Outdated
Comment thread pkg/operator/controller/externaldns/servicemonitor.go
ownerRef := metav1.NewControllerRef(externalDNS, operatorv1beta1.GroupVersion.WithKind("ExternalDNS"))
desired.SetOwnerReferences([]metav1.OwnerReference{*ownerRef})

current := &unstructured.Unstructured{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why unstructured object? Did you try to add ServiceMonitor scheme to avoid this?

Copy link
Copy Markdown
Contributor Author

@Thealisyed Thealisyed May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went with unstructured objects to avoid adding github.com/prometheus-operator/prometheus-operator as a direct dependency in go.mod. Pulling in that module brings would increase binary size, maintenance so seemed unnecessary to do so.

The inspiration was drawn from CIO using unstructured for Canary resources. However if you feel the type safety and readability of a typed ServiceMonitor outweighs the dependency cost I can change.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a6abeee6-f6b1-418d-9482-b4ddb54fbffc

📥 Commits

Reviewing files that changed from the base of the PR and between 10a004a and 8848be6.

📒 Files selected for processing (19)
  • bundle/manifests/external-dns-operator.clusterserviceversion.yaml
  • bundle/manifests/external-dns_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • config/manager/manager.yaml
  • config/rbac/operand_role.yaml
  • config/rbac/role.yaml
  • docs/openshift.md
  • main.go
  • pkg/operator/config/config.go
  • pkg/operator/controller/externaldns/controller.go
  • pkg/operator/controller/externaldns/deployment.go
  • pkg/operator/controller/externaldns/deployment_test.go
  • pkg/operator/controller/externaldns/pod.go
  • pkg/operator/controller/externaldns/pod_test.go
  • pkg/operator/controller/externaldns/service.go
  • pkg/operator/controller/externaldns/service_test.go
  • pkg/operator/controller/externaldns/servicemonitor.go
  • pkg/operator/controller/externaldns/servicemonitor_test.go
  • pkg/operator/controller/names.go
  • pkg/operator/operator.go
✅ Files skipped from review due to trivial changes (1)
  • config/manager/manager.yaml
🚧 Files skipped from review as they are similar to previous changes (18)
  • pkg/operator/controller/names.go
  • pkg/operator/controller/externaldns/service.go
  • config/rbac/operand_role.yaml
  • pkg/operator/config/config.go
  • pkg/operator/controller/externaldns/servicemonitor.go
  • bundle/manifests/external-dns_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • main.go
  • pkg/operator/controller/externaldns/service_test.go
  • pkg/operator/controller/externaldns/controller.go
  • pkg/operator/controller/externaldns/servicemonitor_test.go
  • config/rbac/role.yaml
  • pkg/operator/operator.go
  • pkg/operator/controller/externaldns/pod_test.go
  • docs/openshift.md
  • pkg/operator/controller/externaldns/deployment_test.go
  • pkg/operator/controller/externaldns/pod.go
  • bundle/manifests/external-dns-operator.clusterserviceversion.yaml
  • pkg/operator/controller/externaldns/deployment.go

📝 Walkthrough

Walkthrough

Adds kube-rbac-proxy-based HTTPS metrics support: introduces a CLI/config field and default image for the kube-rbac-proxy, RBAC entries for services/ServiceMonitors/tokenreviews/subjectaccessreviews, controller watches for Service and ServiceMonitor, reconciliation to ensure per-zone metrics Service and ServiceMonitor, deployment sidecars with metrics TLS cert volume, naming helpers for metrics artifacts, and unit tests covering the metrics plumbing.

Changes

kube-rbac-proxy Metrics Support

Layer / File(s) Summary
Configuration & Constants
main.go, pkg/operator/config/config.go
Add KubeRBACProxyImage config field and DefaultKubeRBACProxyImage constant; bind CLI flag for the proxy image.
Operator Wiring
pkg/operator/operator.go, pkg/operator/controller/externaldns/controller.go
Pass KubeRBACProxyImage into externaldnsctrl.Config; add watches for corev1.Service and unstructured ServiceMonitor; reconcile now ensures metrics Service and ServiceMonitor after deployment.
RBAC & Manager Manifest
bundle/.../external-dns-operator.clusterserviceversion.yaml, bundle/.../external-dns_rbac.authorization.k8s.io_v1_clusterrole.yaml, config/manager/manager.yaml, config/rbac/operand_role.yaml, config/rbac/role.yaml
Add RBAC rules for services, servicemonitors, tokenreviews, subjectaccessreviews; add RELATED_IMAGE_KUBE_RBAC_PROXY env var and --kube-rbac-proxy-image arg in manager manifest.
Deployment Integration
pkg/operator/controller/externaldns/deployment.go, pkg/operator/controller/externaldns/deployment_test.go
Extend ensureExternalDNSDeployment to accept kubeRBACProxyImage; include per-zone kube-rbac-proxy sidecars and metrics cert volume when set; add equalContainerPorts to compare ports; update tests and call sites.
Pod / Container Builders
pkg/operator/controller/externaldns/pod.go, pkg/operator/controller/externaldns/pod_test.go
Add helpers: numMetricsPorts, sequence-based names, kubeRBACProxyContainer, metricsCertVolume; tests for port counting, container construction, cert volume, and equalContainerPorts.
Metrics Service Management
pkg/operator/controller/externaldns/service.go, pkg/operator/controller/externaldns/service_test.go
New desiredMetricsService and ensureExternalDNSMetricsService to create/update per-zone metrics Services; change-detection and update logic; tests for creation and drift detection.
ServiceMonitor Management
pkg/operator/controller/externaldns/servicemonitor.go, pkg/operator/controller/externaldns/servicemonitor_test.go
New unstructured ServiceMonitor builder and reconciler ensureExternalDNSServiceMonitor; spec/endpoint construction, drift detection, and tests.
Naming Utilities
pkg/operator/controller/names.go
Add ExternalDNSMetricsServiceName, ExternalDNSServiceMonitorName, ExternalDNSMetricsSecretName and truncatedName helper with maxDNSLabelLength and hashing.
Documentation
docs/openshift.md
Document operand metrics exposure via kube-rbac-proxy sidecars, per-zone ports, and Service/ServiceMonitor behavior.

Sequence Diagram

sequenceDiagram
    participant Ctrl as Controller
    participant Depl as Deployment
    participant Svc as Metrics Service
    participant SM as ServiceMonitor
    participant K8s as Kubernetes API

    Ctrl->>Ctrl: Reconcile ExternalDNS
    alt KubeRBACProxyImage configured
        Ctrl->>Depl: Ensure deployment with kube-rbac-proxy sidecars & cert volume
        Depl->>K8s: Create/Update Deployment
        K8s-->>Depl: Deployment acknowledged

        Ctrl->>Svc: Ensure per-zone metrics Service
        Svc->>K8s: Create/Update Service
        K8s-->>Svc: Service acknowledged

        Ctrl->>SM: Ensure ServiceMonitor (unstructured)
        SM->>K8s: Create/Update ServiceMonitor
        K8s-->>SM: ServiceMonitor acknowledged

        Ctrl->>K8s: Watch Service and ServiceMonitor changes
    else No kube-rbac-proxy configured
        Ctrl->>Depl: Ensure deployment without sidecars
        Depl->>K8s: Create/Update Deployment
    end
    K8s-->>Ctrl: Reconcile complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: exposing ExternalDNS operand metrics via kube-rbac-proxy sidecar, which is the primary objective of this changeset.
Description check ✅ Passed The description is directly related to the changeset, detailing the implementation approach including CLI flags, sidecar injection, Service/ServiceMonitor creation, RBAC rules, and watch configuration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains only standard Go testing package tests (func TestXxx(*testing.T)), not Ginkgo tests. The custom check for Ginkgo test name stability is not applicable.
Test Structure And Quality ✅ Passed Tests follow best practices: single responsibility, descriptive assertions, table-driven pattern, proper isolation, and consistent with codebase conventions.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. All new tests are standard Go unit tests using the testing package, not Ginkgo. The MicroShift Test Compatibility check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All new tests are standard Go unit tests using testing.T package (pod_test.go, service_test.go, servicemonitor_test.go). Custom check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed No scheduling constraints introduced. No affinity, topology spread, or replica modifications. Compatible with SNO, TNF, TNA, HyperShift topologies.
Ote Binary Stdout Contract ✅ Passed No OTE Binary Stdout Contract violations. Logging via zap defaults to stderr. No direct stdout writes in process-level code or init functions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. All new tests use standard Go testing with func Test*, not Ginkgo patterns. The check does not apply.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.1)

level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Azure/azure-sdk-for-go@v60.1.0+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Azure/go-autorest/autorest@v0.11.27: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Azure/go-autorest/autorest/adal@v0.9.20: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/aws/aws-sdk-go@v1.41.6: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-logr/logr@v1.4.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/google/go-cmp@v0.7.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/infobloxopen/in

... [truncated 13756 characters] ...

red in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/kustomize/cmd/config@v0.14.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/kustomize/kyaml@v0.17.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v4@v4.6.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/yaml@v1.4.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n"

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from alebedev87. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 6, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@Thealisyed: This pull request references Jira Issue OCPBUGS-58102, which is invalid:

  • expected the bug to target either version "5.0." or "openshift-5.0.", but it targets "4.22" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

The ExternalDNS operand binds its Prometheus metrics to localhost (127.0.0.1:7979), making them invisible to the cluster monitoring stack. This adds a kube-rbac-proxy sidecar to proxy metrics over HTTPS, along with a Service and ServiceMonitor for Prometheus discovery.

  • Add --kube-rbac-proxy-image CLI flag and operator config plumbing
  • Inject one kube-rbac-proxy sidecar per zone container (ports 8443+) so multi-zone deployments expose all metrics
  • Create metrics Service with serving-cert annotation for auto TLS
  • Create ServiceMonitor with scheme: https, bearerTokenFile, and tlsConfig for authenticated Prometheus scraping
  • Add tokenreviews/subjectaccessreviews RBAC for kube-rbac-proxy auth
  • Add Service watch to controller for reconciliation on drift

Assisted-by: Claude

Summary by CodeRabbit

Release Notes

  • New Features

  • Added support for metrics monitoring via ServiceMonitor resources

  • Introduced kube-rbac-proxy sidecar support for enhanced security and TLS handling

  • Enabled cluster-monitoring integration for OpenShift environments

  • Tests

  • Expanded test coverage for metrics service and ServiceMonitor functionality

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/operator/operator.go (1)

45-57: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Keep the kubebuilder RBAC source in sync with the new proxy auth verbs.

The CSV now carries tokenreviews and subjectaccessreviews, but this aggregated annotation block still doesn't. The next manifest regeneration can silently drop those permissions again.

Suggested annotation update
 // +kubebuilder:rbac:groups=route.openshift.io,resources=routes,verbs=get;watch;list
 // +kubebuilder:rbac:groups=config.openshift.io,resources=infrastructures,verbs=get;list;watch
+// +kubebuilder:rbac:groups=authentication.k8s.io,resources=tokenreviews,verbs=create
+// +kubebuilder:rbac:groups=authorization.k8s.io,resources=subjectaccessreviews,verbs=create
 // local role
 // +kubebuilder:rbac:groups="",namespace=external-dns-operator,resources=secrets;serviceaccounts;configmaps,verbs=get;list;watch;create;update;patch;delete
 // +kubebuilder:rbac:groups="",namespace=external-dns-operator,resources=services,verbs=get;list;watch;create;update;patch;delete
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/operator.go` around lines 45 - 57, The aggregated kubebuilder
RBAC annotation block in operator.go is missing the proxy auth permissions
(tokenreviews and subjectaccessreviews) that are present in the CSV; add RBAC
annotations to include those resources so future manifest regenerations don't
drop them. Specifically, add a +kubebuilder:rbac line for
group=authentication.k8s.io,resources=tokenreviews,verbs=create and a
+kubebuilder:rbac line for
group=authorization.k8s.io,resources=subjectaccessreviews,verbs=create within
the existing annotation block (the block containing the existing
+kubebuilder:rbac comments) so the operator has the same proxy auth verbs as the
CSV.
🧹 Nitpick comments (1)
pkg/operator/controller/externaldns/deployment_test.go (1)

6002-6002: ⚡ Quick win

Add one positive test path for non-empty kube-rbac-proxy image.

At Line 6002, all test cases pass "" for the new kube-rbac-proxy image parameter, so deployment reconciliation for the enabled sidecar path is not validated here. Add one case with a non-empty image (and OpenShift-enabled conditions) to assert expected sidecar/ports in the resulting Deployment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/controller/externaldns/deployment_test.go` at line 6002, Add a
positive test case in deployment_test.go that calls ensureExternalDNSDeployment
with a non-empty kube-rbac-proxy image string (instead of ""), set the test's
extDNS/OpenShift-enabled conditions appropriately, and verify the returned
Deployment (gotDepl) contains the kube-rbac-proxy container and expected
ports/volume mounts; update the table-driven tests to include this scenario and
assert gotExist and no error as part of the new case, referencing
ensureExternalDNSDeployment, test.OperandNamespace, test.OperandImage,
serviceAccount, tc.credSecret, tc.trustCAConfigMap and &tc.extDNS to locate the
callsite for modification.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/operator/controller/externaldns/controller.go`:
- Around line 239-248: When kubeRBACProxyImage becomes empty you must delete any
previously created metrics resources; add an else branch for the
kubeRBACProxyImage check that calls deletion helpers to remove the owned Service
and ServiceMonitor (e.g. implement and call
r.deleteExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS) and
r.deleteExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS)), have
those helpers delete the resources and treat NotFound as success, and ensure
owner-label/ownerRef checks are used so only the CR's owned metrics resources
are removed; keep the existing ensureExternalDNSMetricsService and
ensureExternalDNSServiceMonitor logic for the creation path.

In `@pkg/operator/controller/externaldns/deployment.go`:
- Around line 301-310: The code adds a managed metrics-cert volume when
cfg.kubeRBACProxyImage is set but does not remove it when the image is later
unset, leaving the pod template referencing a secret that may not exist; update
the reconciliation logic around cfg.kubeRBACProxyImage (the block that calls
kubeRBACProxyContainer and metricsCertVolume and appends to
depl.Spec.Template.Spec.Volumes) to also prune any operator-managed metrics
volume when the proxy is disabled: detect when cfg.kubeRBACProxyImage == "" and
remove from depl.Spec.Template.Spec.Volumes any Volume whose Name or Secret.Name
matches the managed metrics volume (the one returned by metricsCertVolume and/or
controller.ExternalDNSMetricsSecretName(cfg.externalDNS)), and likewise ensure
any proxy sidecar containers (created by kubeRBACProxyContainer) are removed
when disabling the proxy.

In `@pkg/operator/controller/externaldns/service.go`:
- Around line 129-133: metricsServiceChanged currently compares port Name, Port,
and TargetPort but omits ServicePort.Protocol so protocol drift (set in
desiredMetricsService) won't be detected; update the metricsServiceChanged
function to also compare current.Spec.Ports[i].Protocol !=
desired.Spec.Ports[i].Protocol when iterating desired.Spec.Ports (and ensure the
loop bounds use the same index set), returning true if they differ so
reconciliation will update the Protocol field.

---

Outside diff comments:
In `@pkg/operator/operator.go`:
- Around line 45-57: The aggregated kubebuilder RBAC annotation block in
operator.go is missing the proxy auth permissions (tokenreviews and
subjectaccessreviews) that are present in the CSV; add RBAC annotations to
include those resources so future manifest regenerations don't drop them.
Specifically, add a +kubebuilder:rbac line for
group=authentication.k8s.io,resources=tokenreviews,verbs=create and a
+kubebuilder:rbac line for
group=authorization.k8s.io,resources=subjectaccessreviews,verbs=create within
the existing annotation block (the block containing the existing
+kubebuilder:rbac comments) so the operator has the same proxy auth verbs as the
CSV.

---

Nitpick comments:
In `@pkg/operator/controller/externaldns/deployment_test.go`:
- Line 6002: Add a positive test case in deployment_test.go that calls
ensureExternalDNSDeployment with a non-empty kube-rbac-proxy image string
(instead of ""), set the test's extDNS/OpenShift-enabled conditions
appropriately, and verify the returned Deployment (gotDepl) contains the
kube-rbac-proxy container and expected ports/volume mounts; update the
table-driven tests to include this scenario and assert gotExist and no error as
part of the new case, referencing ensureExternalDNSDeployment,
test.OperandNamespace, test.OperandImage, serviceAccount, tc.credSecret,
tc.trustCAConfigMap and &tc.extDNS to locate the callsite for modification.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 78bfc9c0-0a8d-427a-9118-c5a7267ed0a8

📥 Commits

Reviewing files that changed from the base of the PR and between 9a94691 and fd0aae1.

📒 Files selected for processing (18)
  • bundle/manifests/external-dns-operator.clusterserviceversion.yaml
  • bundle/manifests/external-dns_rbac.authorization.k8s.io_v1_clusterrole.yaml
  • config/manager/manager.yaml
  • config/rbac/operand_role.yaml
  • config/rbac/role.yaml
  • main.go
  • pkg/operator/config/config.go
  • pkg/operator/controller/externaldns/controller.go
  • pkg/operator/controller/externaldns/deployment.go
  • pkg/operator/controller/externaldns/deployment_test.go
  • pkg/operator/controller/externaldns/pod.go
  • pkg/operator/controller/externaldns/pod_test.go
  • pkg/operator/controller/externaldns/service.go
  • pkg/operator/controller/externaldns/service_test.go
  • pkg/operator/controller/externaldns/servicemonitor.go
  • pkg/operator/controller/externaldns/servicemonitor_test.go
  • pkg/operator/controller/names.go
  • pkg/operator/operator.go

Comment on lines +239 to +248
// Ensure metrics service and service monitor for Prometheus scraping.
// Owner references on these resources ensure cascade deletion when the ExternalDNS CR is removed.
if kubeRBACProxyImage != "" {
if err := r.ensureExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS metrics service: %w", err)
}
if err := r.ensureExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS service monitor: %w", err)
}
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Delete the owned metrics resources when metrics exposure is turned off.

When kubeRBACProxyImage becomes empty, this branch is skipped entirely, so any previously created metrics Service and ServiceMonitor remain behind. That leaves stale scrape targets pointing at dead ports, and the disabled watches mean nothing reconciles them away.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/controller/externaldns/controller.go` around lines 239 - 248,
When kubeRBACProxyImage becomes empty you must delete any previously created
metrics resources; add an else branch for the kubeRBACProxyImage check that
calls deletion helpers to remove the owned Service and ServiceMonitor (e.g.
implement and call r.deleteExternalDNSMetricsService(ctx, r.config.Namespace,
externalDNS) and r.deleteExternalDNSServiceMonitor(ctx, r.config.Namespace,
externalDNS)), have those helpers delete the resources and treat NotFound as
success, and ensure owner-label/ownerRef checks are used so only the CR's owned
metrics resources are removed; keep the existing ensureExternalDNSMetricsService
and ensureExternalDNSServiceMonitor logic for the creation path.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable. kube-rbac-proxy image is configured for the operand in CSV, therefore it's not possible to override it without scaling down OLM operator (unsupported path). I think that we should add the kube-rbac-proxy operand support unconditionally. If it would need to be disabled we need to think about a new API field for this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alebedev87, that's a fair point — since RELATED_IMAGE_KUBE_RBAC_PROXY is baked into the CSV and can only change via an OLM operator update (which restarts the operator anyway), the runtime scenario where kubeRBACProxyImage transitions from a non-empty value back to empty is effectively unreachable. The stale-resource concern doesn't apply in practice given that deployment model.

The idea of a dedicated API field for disabling metrics exposure in the future sounds like the right approach if that need arises. Would you like me to open a follow-up GitHub issue to track that?


📝 Learnings were identified and not saved (knowledge base disabled). Enable

Comment on lines +301 to +310
// Add kube-rbac-proxy sidecar(s) and metrics cert volume for secure metrics exposure.
// One sidecar per zone container, each proxying the corresponding metrics port.
if cfg.kubeRBACProxyImage != "" {
for i := 0; i < cbld.counter; i++ {
proxyContainer := kubeRBACProxyContainer(cfg.kubeRBACProxyImage, i)
depl.Spec.Template.Spec.Containers = append(depl.Spec.Template.Spec.Containers, proxyContainer)
}
certVolume := metricsCertVolume(controller.ExternalDNSMetricsSecretName(cfg.externalDNS))
depl.Spec.Template.Spec.Volumes = append(depl.Spec.Template.Spec.Volumes, certVolume)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prune the managed metrics-cert volume when the proxy is disabled.

This block adds a new operator-managed secret volume, but the update path never removes unexpected volumes. If --kube-rbac-proxy-image is later unset, the sidecars disappear while the pod template can still reference metrics-cert, leaving the rollout dependent on a secret that is no longer created.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/controller/externaldns/deployment.go` around lines 301 - 310,
The code adds a managed metrics-cert volume when cfg.kubeRBACProxyImage is set
but does not remove it when the image is later unset, leaving the pod template
referencing a secret that may not exist; update the reconciliation logic around
cfg.kubeRBACProxyImage (the block that calls kubeRBACProxyContainer and
metricsCertVolume and appends to depl.Spec.Template.Spec.Volumes) to also prune
any operator-managed metrics volume when the proxy is disabled: detect when
cfg.kubeRBACProxyImage == "" and remove from depl.Spec.Template.Spec.Volumes any
Volume whose Name or Secret.Name matches the managed metrics volume (the one
returned by metricsCertVolume and/or
controller.ExternalDNSMetricsSecretName(cfg.externalDNS)), and likewise ensure
any proxy sidecar containers (created by kubeRBACProxyContainer) are removed
when disabling the proxy.

Comment on lines +129 to +133
for i := range desired.Spec.Ports {
if current.Spec.Ports[i].Name != desired.Spec.Ports[i].Name ||
current.Spec.Ports[i].Port != desired.Spec.Ports[i].Port ||
current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort {
return true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Include ServicePort.Protocol in drift detection.

desiredMetricsService explicitly sets Protocol (Line 48), but metricsServiceChanged never compares it. If protocol drifts, reconcile won’t repair it.

Suggested patch
 	for i := range desired.Spec.Ports {
 		if current.Spec.Ports[i].Name != desired.Spec.Ports[i].Name ||
 			current.Spec.Ports[i].Port != desired.Spec.Ports[i].Port ||
-			current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort {
+			current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort ||
+			current.Spec.Ports[i].Protocol != desired.Spec.Ports[i].Protocol {
 			return true
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i := range desired.Spec.Ports {
if current.Spec.Ports[i].Name != desired.Spec.Ports[i].Name ||
current.Spec.Ports[i].Port != desired.Spec.Ports[i].Port ||
current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort {
return true
for i := range desired.Spec.Ports {
if current.Spec.Ports[i].Name != desired.Spec.Ports[i].Name ||
current.Spec.Ports[i].Port != desired.Spec.Ports[i].Port ||
current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort ||
current.Spec.Ports[i].Protocol != desired.Spec.Ports[i].Protocol {
return true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/controller/externaldns/service.go` around lines 129 - 133,
metricsServiceChanged currently compares port Name, Port, and TargetPort but
omits ServicePort.Protocol so protocol drift (set in desiredMetricsService)
won't be detected; update the metricsServiceChanged function to also compare
current.Spec.Ports[i].Protocol != desired.Spec.Ports[i].Protocol when iterating
desired.Spec.Ports (and ensure the loop bounds use the same index set),
returning true if they differ so reconciliation will update the Protocol field.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/operator/controller/externaldns/controller.go (1)

230-239: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Delete the metrics Service and ServiceMonitor when proxying is disabled.

If KubeRBACProxyImage is cleared, Line 225 stops adding the sidecar to the deployment, but this block also stops reconciling the previously created metrics resources. That leaves stale scrape targets behind until the ExternalDNS CR itself is deleted.

Suggested fix
  if r.config.KubeRBACProxyImage != "" {
  	if err := r.ensureExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS); err != nil {
  		return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS metrics service: %w", err)
  	}
  	if err := r.ensureExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS); err != nil {
  		return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS service monitor: %w", err)
  	}
+	} else {
+		if err := r.deleteExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS); err != nil {
+			return reconcile.Result{}, fmt.Errorf("failed to delete externalDNS metrics service: %w", err)
+		}
+		if err := r.deleteExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS); err != nil {
+			return reconcile.Result{}, fmt.Errorf("failed to delete externalDNS service monitor: %w", err)
+		}
  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/controller/externaldns/controller.go` around lines 230 - 239,
When r.config.KubeRBACProxyImage is empty we must remove any previously-created
metrics resources instead of skipping reconciliation; update the controller
logic around ensureExternalDNSMetricsService and ensureExternalDNSServiceMonitor
to call deletion paths when r.config.KubeRBACProxyImage == "" (e.g., invoke
functions to delete the Service and ServiceMonitor corresponding to externalDNS)
so that stale scrape targets are removed; locate the block using
r.config.KubeRBACProxyImage, externalDNS, ensureExternalDNSMetricsService and
ensureExternalDNSServiceMonitor and implement/ call corresponding
deleteExternalDNSMetricsService and deleteExternalDNSServiceMonitor (or add an
ensureDeleted flag to those methods) and handle errors consistently (return
wrapped errors if deletion fails).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@pkg/operator/controller/externaldns/controller.go`:
- Around line 230-239: When r.config.KubeRBACProxyImage is empty we must remove
any previously-created metrics resources instead of skipping reconciliation;
update the controller logic around ensureExternalDNSMetricsService and
ensureExternalDNSServiceMonitor to call deletion paths when
r.config.KubeRBACProxyImage == "" (e.g., invoke functions to delete the Service
and ServiceMonitor corresponding to externalDNS) so that stale scrape targets
are removed; locate the block using r.config.KubeRBACProxyImage, externalDNS,
ensureExternalDNSMetricsService and ensureExternalDNSServiceMonitor and
implement/ call corresponding deleteExternalDNSMetricsService and
deleteExternalDNSServiceMonitor (or add an ensureDeleted flag to those methods)
and handle errors consistently (return wrapped errors if deletion fails).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 860161f6-d6cd-45d3-999c-a823ea5e85b2

📥 Commits

Reviewing files that changed from the base of the PR and between fd0aae1 and 04caf39.

📒 Files selected for processing (1)
  • pkg/operator/controller/externaldns/controller.go

Copy link
Copy Markdown
Contributor

@alebedev87 alebedev87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another quick look.

Comment thread config/manager/manager.yaml Outdated
metadata:
labels:
name: external-dns-operator
openshift.io/cluster-monitoring: "true"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this manifest is not added to the bundle. The creation of the namespace is taken care either by the OLM operator (web console) or by the user.

Did you add it to make sure metrics are scraped by built in Prometheus stack? Operators metrics worked without this label. Can you double check that the all is scraped well if the namespace is created via web console or manually?

Comment on lines +239 to +248
// Ensure metrics service and service monitor for Prometheus scraping.
// Owner references on these resources ensure cascade deletion when the ExternalDNS CR is removed.
if kubeRBACProxyImage != "" {
if err := r.ensureExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS metrics service: %w", err)
}
if err := r.ensureExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS service monitor: %w", err)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable. kube-rbac-proxy image is configured for the operand in CSV, therefore it's not possible to override it without scaling down OLM operator (unsupported path). I think that we should add the kube-rbac-proxy operand support unconditionally. If it would need to be disabled we need to think about a new API field for this.

Comment thread pkg/operator/controller/externaldns/servicemonitor.go
@Thealisyed
Copy link
Copy Markdown
Contributor Author

Did we resolve and answer all of Andreys comments and are you happy now? @coderabbitai

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me fetch the review comments from the PR to check the status of Andrey's feedback.


🏁 Script executed:

#!/bin/bash
# Fetch all review comments on PR `#371`
gh pr view 371 --repo openshift/external-dns-operator --json reviews,reviewRequests

echo "---REVIEW COMMENTS---"
gh api repos/openshift/external-dns-operator/pulls/371/comments --paginate \
  --jq '.[] | {id: .id, user: .user.login, body: .body[:300], resolved: .resolved, path: .path, line: .line}'

echo "---PR REVIEWS---"
gh api repos/openshift/external-dns-operator/pulls/371/reviews --paginate \
  --jq '.[] | {id: .id, user: .user.login, state: .state, body: .body[:300]}'

echo "---ISSUE COMMENTS---"
gh api repos/openshift/external-dns-operator/issues/371/comments --paginate \
  --jq '.[] | {id: .id, user: .user.login, body: .body[:500]}'

Length of output: 34815


Let me dig deeper into the specific inline comments from Andrey's latest review round to get the full picture.


🏁 Script executed:

#!/bin/bash
# Get all inline review comments from alebedev87 with full body text
echo "=== ANDREY'S INLINE COMMENTS (FULL) ==="
gh api repos/openshift/external-dns-operator/pulls/371/comments --paginate \
  --jq '.[] | select(.user.login == "alebedev87") | {id: .id, path: .path, line: .line, original_line: .original_line, body: .body, created_at: .created_at}'

echo ""
echo "=== THEALISYED RESPONSES (FULL) ==="
gh api repos/openshift/external-dns-operator/pulls/371/comments --paginate \
  --jq '.[] | select(.user.login == "Thealisyed") | {id: .id, path: .path, line: .line, in_reply_to_id: .in_reply_to_id, body: .body, created_at: .created_at}'

echo ""
echo "=== COMMENT THREADS (resolved status via GraphQL) ==="
gh api graphql -f query='
{
  repository(owner: "openshift", name: "external-dns-operator") {
    pullRequest(number: 371) {
      reviewThreads(first: 50) {
        nodes {
          isResolved
          isOutdated
          comments(first: 5) {
            nodes {
              author { login }
              body
              path
              line
              createdAt
            }
          }
        }
      }
    }
  }
}' --jq '.data.repository.pullRequest.reviewThreads.nodes[] | {isResolved: .isResolved, isOutdated: .isOutdated, path: .comments.nodes[0].path, line: .comments.nodes[0].line, firstAuthor: .comments.nodes[0].author.login, firstBody: .comments.nodes[0].body[:200], commentCount: (.comments.nodes | length)}'

Length of output: 13931


@Thealisyed, great progress! Almost all of Andrey's comments have been resolved. Here's a full status breakdown:


✅ Andrey's Resolved Threads (10/12)

File Comment Status
bundle/manifests/...clusterserviceversion.yaml Konflux bundle mapping for new image ✅ Resolved
config/rbac/operand_role.yaml Run make bundle to sync manifests ✅ Resolved
main.go Precise that flag is for ExternalDNS ✅ Resolved (outdated)
controller.go Gate service watch on proxy image ✅ Resolved
controller.go Add a watch for ServiceMonitor ✅ Resolved
controller.go Remove IsOpenShift guard on ServiceMonitor ✅ Resolved
controller.go Use owner reference for cascade deletion ✅ Resolved
pod.go Align naming functions (port name vs container name) ✅ Resolved
service.go Function signature suggestion for desiredMetricsService ✅ Resolved
servicemonitor.go (line 103) Metric de-duplication / document in developer docs ✅ Resolved

❌ Still Open Threads

Two of Andrey's threads remain unresolved:

  1. servicemonitor.go (line 48) — Andrey asked "Why unstructured object? Did you try to add ServiceMonitor scheme to avoid this?" You explained the reasoning (avoiding the prometheus-operator dependency), but Andrey has not confirmed acceptance yet. This thread is still open — you may want to ping him to confirm.

  2. config/manager/manager.yaml (line 6) — Andrey noted that this manifest isn't added to the bundle, questioned whether the openshift.io/cluster-monitoring: "true" label is actually needed, and asked you to verify that metrics are scraped correctly when the namespace is created via web console or manually. No response from you yet on this one.


Three of my (CodeRabbit) threads are also still open:

File Comment Severity
service.go (line 133) metricsServiceChanged doesn't compare ServicePort.Protocol — drift won't be detected 🟡 Minor
controller.go (line 239) Delete metrics resources when proxy disabled 🟠 Major — but Andrey indicated unconditional support is the preferred direction, making this less critical
deployment.go (line 310) Prune metrics-cert volume when proxy disabled 🟠 Major — same design rationale as above

The key items needing your attention before approval are likely Andrey's two unresolved threads, especially the config/manager/manager.yaml question which has no response yet.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 7, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/operator/controller/externaldns/deployment.go`:
- Around line 301-308: The code currently skips adding kube-rbac-proxy sidecars
when cfg.kubeRBACProxyImage == "" but still creates metrics
Service/ServiceMonitor; change this to fail fast: check cfg.kubeRBACProxyImage
at the start of the deployment creation/reconcile path and return an error if
it's empty (e.g., fmt.Errorf("kubeRBACProxyImage must be set")), instead of
silently omitting the proxy; update the code surrounding the
kubeRBACProxyContainer(...) and
metricsCertVolume(controller.ExternalDNSMetricsSecretName(...)) calls so they
only run after the non-empty validation and ensure the calling reconcile loop
surfaces/handles the returned error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d998c4dc-57be-4b48-8f4a-b7509fa7e180

📥 Commits

Reviewing files that changed from the base of the PR and between 04caf39 and 10a004a.

📒 Files selected for processing (4)
  • config/manager/manager.yaml
  • docs/openshift.md
  • pkg/operator/controller/externaldns/controller.go
  • pkg/operator/controller/externaldns/deployment.go
✅ Files skipped from review due to trivial changes (1)
  • docs/openshift.md

Comment on lines +301 to +308
if cfg.kubeRBACProxyImage != "" {
for i := 0; i < cbld.counter; i++ {
proxyContainer := kubeRBACProxyContainer(cfg.kubeRBACProxyImage, i)
depl.Spec.Template.Spec.Containers = append(depl.Spec.Template.Spec.Containers, proxyContainer)
}
certVolume := metricsCertVolume(controller.ExternalDNSMetricsSecretName(cfg.externalDNS))
depl.Spec.Template.Spec.Volumes = append(depl.Spec.Template.Spec.Volumes, certVolume)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when the proxy image is unset.

Lines 301-307 silently skip the kube-rbac-proxy sidecars, but the reconcile path now always creates the metrics Service and ServiceMonitor. That leaves broken scrape targets instead of a clear config error. Either make the proxy unconditional here as well, or return an error when cfg.kubeRBACProxyImage is empty.

Proposed fix
-	if cfg.kubeRBACProxyImage != "" {
-		for i := 0; i < cbld.counter; i++ {
-			proxyContainer := kubeRBACProxyContainer(cfg.kubeRBACProxyImage, i)
-			depl.Spec.Template.Spec.Containers = append(depl.Spec.Template.Spec.Containers, proxyContainer)
-		}
-		certVolume := metricsCertVolume(controller.ExternalDNSMetricsSecretName(cfg.externalDNS))
-		depl.Spec.Template.Spec.Volumes = append(depl.Spec.Template.Spec.Volumes, certVolume)
-	}
+	if cfg.kubeRBACProxyImage == "" {
+		return nil, fmt.Errorf("kube-rbac-proxy image must be configured")
+	}
+	for i := 0; i < cbld.counter; i++ {
+		proxyContainer := kubeRBACProxyContainer(cfg.kubeRBACProxyImage, i)
+		depl.Spec.Template.Spec.Containers = append(depl.Spec.Template.Spec.Containers, proxyContainer)
+	}
+	certVolume := metricsCertVolume(controller.ExternalDNSMetricsSecretName(cfg.externalDNS))
+	depl.Spec.Template.Spec.Volumes = append(depl.Spec.Template.Spec.Volumes, certVolume)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/operator/controller/externaldns/deployment.go` around lines 301 - 308,
The code currently skips adding kube-rbac-proxy sidecars when
cfg.kubeRBACProxyImage == "" but still creates metrics Service/ServiceMonitor;
change this to fail fast: check cfg.kubeRBACProxyImage at the start of the
deployment creation/reconcile path and return an error if it's empty (e.g.,
fmt.Errorf("kubeRBACProxyImage must be set")), instead of silently omitting the
proxy; update the code surrounding the kubeRBACProxyContainer(...) and
metricsCertVolume(controller.ExternalDNSMetricsSecretName(...)) calls so they
only run after the non-empty validation and ensure the calling reconcile loop
surfaces/handles the returned error.

Thealisyed added 5 commits May 8, 2026 10:06
ExternalDNS operand pods bind metrics to 127.0.0.1, making them
inaccessible from outside the pod. Prometheus cannot scrape them.

This adds a kube-rbac-proxy sidecar to each operand deployment that
proxies the localhost metrics over HTTPS, along with a Service
(annotated for OpenShift serving cert) and ServiceMonitor for
automatic Prometheus scraping.

Resolves: OCPBUGS-58102

Assisted with Claude
- Gate metrics sidecar/service/servicemonitor behind IsOpenShift (serving-cert dependency)
- Add Service and ServiceMonitor watches with KubeRBACProxyImage+IsOpenShift guard
- Switch ServiceMonitor ownerRef to controllerutil.SetControllerReference
- Remove manual delete logic; owner references handle cascade deletion
- Add kubeRBACProxyContainerNameForSeq for naming consistency
- Rename metricsService to desiredMetricsService
- Add labels/annotations comparison to metricsServiceChanged and serviceMonitorChanged
- Sync labels in ServiceMonitor update flow
- Preserve immutable Service fields (ClusterIPs, IPFamilies, IPFamilyPolicy) on update
- Clarify kube-rbac-proxy flag description for ExternalDNS operand
- Add manager.yaml env var and namespace label for cluster monitoring
- Sync bundle manifests via make bundle

Assisted-by: Claude <noreply@anthropic.com>
This operator exclusively targets OpenShift, so gating the metrics
sidecar, Service, and ServiceMonitor behind IsOpenShift was redundant.
They are now gated only on KubeRBACProxyImage being set.

Assisted-by: Claude <noreply@anthropic.com>
- Remove KubeRBACProxyImage guards from controller watches and reconcile;
  the sidecar, Service, and ServiceMonitor are now always created
- Remove openshift.io/cluster-monitoring label from manager.yaml since
  this manifest is not in the bundle
- Add operand metrics documentation to docs/openshift.md covering
  multi-zone metric differentiation via ports and instance labels

Assisted-by: Claude <noreply@anthropic.com>
The controller unconditionally creates metrics Service and ServiceMonitor,
so a missing kube-rbac-proxy image would produce broken scrape targets.
Validate at operator startup to catch this misconfiguration before any
reconciliation runs.

Assisted-by: Claude <noreply@anthropic.com>
@Thealisyed Thealisyed force-pushed the fix/operand-metrics-exposure branch from 10a004a to 8848be6 Compare May 8, 2026 09:07
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 8, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 8, 2026

@Thealisyed: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@Thealisyed
Copy link
Copy Markdown
Contributor Author

did we resolve / answer Andreys comment and are you happy now? @coderabbitai

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== CURRENT THREAD STATUS VIA GRAPHQL ==="
gh api graphql -f query='
{
  repository(owner: "openshift", name: "external-dns-operator") {
    pullRequest(number: 371) {
      reviewThreads(first: 50) {
        nodes {
          isResolved
          isOutdated
          comments(first: 10) {
            nodes {
              author { login }
              body
              path
              line
              createdAt
            }
          }
        }
      }
    }
  }
}' --jq '.data.repository.pullRequest.reviewThreads.nodes[] | {isResolved: .isResolved, isOutdated: .isOutdated, path: .comments.nodes[0].path, firstAuthor: .comments.nodes[0].author.login, firstBody: .comments.nodes[0].body[:300], lastAuthor: .comments.nodes[-1].author.login, lastBody: .comments.nodes[-1].body[:300], commentCount: (.comments.nodes | length)}'

Length of output: 9540


@Thealisyed, here's the current status — still largely the same picture as before:


Andrey's Threads: 10/12 Resolved ✅

Two threads are still unresolved:

# File Thread Status
1 servicemonitor.go "Why unstructured object? Did you try to add ServiceMonitor scheme?" ❌ Open — you replied with the rationale (avoiding prometheus-operator dependency), but Andrey has not yet acknowledged or accepted your explanation
2 config/manager/manager.yaml "Manifest not in bundle; is openshift.io/cluster-monitoring label needed? Verify scraping works." ❌ Unresolved (now outdated due to code changes) — you haven't responded to his questions yet

For thread #1, you may want to ping Andrey directly to confirm if he's satisfied with the explanation.

For thread #2, even though the thread became outdated (the underlying line changed), Andrey's questions were substantive — it would be good to leave a reply addressing:

  • Whether the openshift.io/cluster-monitoring: "true" label is required and what was verified
  • How namespace creation (via OLM/console vs. manual) affects metrics scraping

My (CodeRabbit) Open Threads: 4

# File Issue Outdated?
1 service.go metricsServiceChanged doesn't compare ServicePort.Protocol — drift won't be detected 🟢 Active
2 deployment.go Fail fast when proxy image is unset (scrape targets created but no sidecars) 🟢 Active
3 controller.go Delete metrics resources when proxy disabled Outdated
4 deployment.go Prune metrics-cert volume when proxy disabled Outdated

The two outdated CodeRabbit threads (#3/#4) are lower priority given the design direction (proxy image is baked into the CSV). Threads #1 and #2 are still worth a look if you want to clean up before approval.

@Thealisyed
Copy link
Copy Markdown
Contributor Author

Thealisyed commented May 8, 2026

In regards to thread # 2
We removed the label from manager.yaml since it's not in the bundle. The openshift.io/cluster-monitoring: "true" label is needed for cluster Prometheus to discover our operand ServiceMonitor.
Operator metrics likely work without it because OLM handles those separately I am guessing?
From my understanding neither OLM adds this label automatically, so this would be a gap?
Should we have the operator ensure the label at startup or handle it as a follow-up?
Am I correct in how I understand this to work @coderabbitai, correct me if wrong!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Tip

For best results, initiate chat on the files or code changes.

@Thealisyed, your understanding is largely correct! Here's a precise breakdown:


How the openshift.io/cluster-monitoring: "true" label works

The cluster Prometheus (running in openshift-monitoring) uses a serviceMonitorNamespaceSelector that matches namespaces bearing the openshift.io/cluster-monitoring: "true" label — this label is what allows the cluster Prometheus to discover and scrape ServiceMonitors in that namespace. So yes, without this label on the operator's namespace, the ServiceMonitor will not be picked up by the cluster-level Prometheus, and operand metrics won't be scraped.


Does OLM add the label automatically?

There is a CSV annotation operatorframework.io/cluster-monitoring=true — when this annotation is set to true, the OpenShift Console (not OLM itself) will apply the openshift.io/cluster-monitoring=true label to the namespace the operator is being installed into, and will also update the Prometheus Operator ServiceAccount with appropriate RBAC privileges for that namespace.

So:

  • Console-based OLM install with this annotation on the CSV → label is applied automatically ✅
  • CLI/manual install → label is not added, and neither OLM itself nor anything else will add it → this IS a gap ❌

Your hypothesis about operator metrics working differently is also correct — the operator's own controller-runtime metrics typically live in an openshift-* namespace (where the cluster Prometheus already watches unconditionally), which is why those metrics surface without this label.


Recommendation

The cleanest fix is a two-pronged approach:

  1. Add the operatorframework.io/cluster-monitoring: "true" annotation to the CSV (bundle/manifests/external-dns-operator.clusterserviceversion.yaml). This makes console-based OLM installs "just work":

    metadata:
      annotations:
        operatorframework.io/cluster-monitoring: "true"
  2. Document the manual step in docs/openshift.md — for CLI/manual installs, users must label their namespace:

    oc label namespace <operator-namespace> openshift.io/cluster-monitoring=true

Should the operator ensure the label programmatically at startup?

That's a valid option, but it has trade-offs:

  • It requires the operator to have namespaces/patch or namespaces/update RBAC, which is a broader permission than typically desirable.
  • Namespace labels are admin-managed resources; operators mutating them is a bit invasive.
  • The CSV annotation approach achieves the same for the common case (console install) without extra RBAC.

I'd lean towards CSV annotation + documentation for manual installs rather than runtime namespace patching — but this is worth flagging in your response to Andrey as the proposed fix, since his original question was specifically about this label and how scraping is verified.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Tip

For best results, initiate chat on the files or code changes.

@Thealisyed, your understanding is largely correct! Here's a precise breakdown:


How openshift.io/cluster-monitoring: "true" Works

The cluster Prometheus (managed by the Cluster Monitoring Operator) uses a serviceMonitorNamespaceSelector to only pick up ServiceMonitor resources from namespaces that carry this label. Without it, the cluster Prometheus simply won't discover your ServiceMonitor, regardless of how well it's configured.


Does OLM Add the Label Automatically?

No — OLM itself does not. However, the OpenShift enhancement defines a mechanism for this: the operatorframework.io/cluster-monitoring=true annotation in the CSV. When this annotation is set to true, the OpenShift Console will update the namespace with the openshift.io/cluster-monitoring=true label during installation, and will also update the Prometheus Operator ServiceAccount with the appropriate RBAC privileges for that namespace.

OLM itself is not responsible for fulfilling these Namespace and RBAC requirements; instead, OLM defines how operator authors can provide the OpenShift Console with the information needed to fulfill them.


Is There a Gap?

Yes — your instinct is right. The Console-driven path only works when the operator is installed via the OLM UI (OperatorHub). For manual installs (oc apply, Helm, GitOps/ArgoCD, etc.), the namespace label won't be set automatically. This is a real gap.


Recommendation

The standard approach for OLM-managed operators is to add the following annotation to the CSV metadata:

metadata:
  annotations:
    operatorframework.io/cluster-monitoring: "true"

This is already the intended mechanism. For the manual install gap, there are two options:

  1. Document it — instruct users to run oc label namespace <ns> openshift.io/cluster-monitoring=true after a manual install.
  2. Operator ensures it at startup — the operator can reconcile the label on its own namespace at startup. This is more robust but requires the operator's ServiceAccount to have get/patch permissions on namespaces, which is a cluster-scoped privilege that may face review scrutiny.

Given the complexity, I'd suggest going with the CSV annotation as the primary path and documenting the manual install requirement as a follow-up, rather than adding cluster-scoped namespace-patching permissions to the operator. Would you like to open a follow-up issue to track this?

@Thealisyed
Copy link
Copy Markdown
Contributor Author

Posted this question in our slack channel, lets see what they say for this @coderabbitai

Working on PR for EDO (adding a ServiceMonitor for ExternalDNS operand metrics). The namespace needs openshift.io/cluster-monitoring: "true" for cluster Prometheus to discover it.
I looked into how our components handle this and CIO / CDO have the label into their namespace manifests but they're CVO managed in openshift-* namespaces.
For our add on-operators like EDO & ALBO, the namespace is user created so that path doesn't apply.

Other OLM operators use the operatorframework.io/cluster-monitoring: "true" CSV annotation whereas our CSV doesn't have it. Is this something to consider for our add in operators?

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@Thealisyed, good thinking to loop in the team! Your analysis is spot-on. Here's what the official enhancement docs confirm:


How the Mechanism Works for OLM Add-on Operators

The operatorframework.io/cluster-monitoring=true annotation on the CSV signals the OpenShift Console to update the namespace that the operator is being deployed to with the openshift.io/cluster-monitoring=true label. When this annotation is present, the Console will also update the OpenShift Monitoring Prometheus Operator ServiceAccount with the appropriate RBAC privileges for the given namespace, allowing the operator to be scraped by the OpenShift Monitoring Prometheus Operator.

Crucially, OLM itself will not be responsible for fulfilling the Namespace and RBAC requirements — instead, OLM defines how operator authors can provide the OpenShift Console with the information required to fulfill these requirements.


It's an Opt-In Checkbox in the Console

One important nuance: operator developers can suggest that admins allow the namespace to be monitored, but this option is presented deselected by default so that the admin can deliberately opt in. So even with the CSV annotation, it's not fully automatic during console install — the admin must check the box.

This is consistent with what the OpenShift Logging docs say: the console option sets the openshift.io/cluster-monitoring: "true" label in the Namespace object, and you must select this option to ensure that cluster monitoring scrapes the namespace.


The Regulation Aspect

If OLM allows any operator to report metrics to the OpenShift Monitoring Prometheus instance, there is a chance the instance could be overloaded or the integrity of the data could be jeopardized. To minimize this risk, the number of operators granted permission to include metrics is highly regulated — operators must not be added to officially supported CatalogSources without first being reviewed.

This is worth keeping in mind — adding this annotation to the EDO CSV may need to go through a review/approval process within Red Hat before it can be included in the supported catalog.


Summary / Recommended Path

Your Slack question has it right. The recommended approach for OLM add-on operators (EDO, ALBO, etc.) is:

  1. Add to the CSV:

    metadata:
      annotations:
        operatorframework.io/cluster-monitoring: "true"
        operatorframework.io/suggested-namespace: "external-dns-operator"  # if not already present

    This covers the console-based OLM install path (with the admin opt-in).

  2. Document the manual install gap in docs/openshift.md — for CLI/GitOps installs, users must manually label the namespace:

    oc label namespace <operator-namespace> openshift.io/cluster-monitoring=true
  3. Note the regulation requirement — the annotation may require a review before being added to the supported catalog.

This is all separate from the current PR's scope. Would you like me to open a follow-up GitHub issue to track the operatorframework.io/cluster-monitoring CSV annotation addition and documentation update?


🌐 Web query:

operatorframework.io/cluster-monitoring annotation CSV OLM OpenShift namespace label

💡 Result:

To have an OLM-installed Operator opt into OpenShift cluster monitoring (the openshift-monitoring Prometheus), you need:

  1. CSV annotation (in the bundle’s CSV metadata)
metadata:
  annotations:
    operatorframework.io/cluster-monitoring: "true"

This annotation is consumed by the OpenShift Console during install. It indicates the Operator wants the install namespace to be made “cluster-monitorable.” [1], [2]

  1. Namespace label that actually enables cluster-monitoring scraping
    The Console (when the annotation above is set) applies this label to the Operator’s install namespace:
oc label namespace <ns> openshift.io/cluster-monitoring=true

OpenShift’s enhancement design also calls out that the namespace must be an openshift--prefixed namespace (and is effectively restricted/whitelisted). [1]

Notes:

  • This is different from user workload monitoring, which uses the openshift.io/user-monitoring label (separate Prometheus stack in openshift-user-workload-monitoring). [1]

Sources:
[1] OpenShift enhancement: OLM-managed operator metrics + required namespace label + CSV annotation behavior https://github.com/openshift/enhancements/blob/master/enhancements/olm/olm-managed-operator-metrics.md
[2] Red Hat Bugzilla (Console support for operatorframework.io/cluster-monitoring=true and labeling the namespace) https://bugzilla.redhat.com/show_bug.cgi?id=1799051
[3] OpenShift design note (Operators can request enabling cluster monitoring on install namespaces) https://openshift.github.io/openshift-origin-design/designs/administrator/olm/install-forcenamespace/


🌐 Web query:

OpenShift OLM addon operator cluster-monitoring namespace label best practice

💡 Result:

Best practice (OpenShift OLM add-on Operators + monitoring namespace labels):

  • Prefer User Workload Monitoring (UWM) for add-ons that expose application metrics. Enable UWM via cluster-monitoring-config (enableUserWorkload: true) and have the operator create ServiceMonitor/PodMonitor in its own namespace; avoid deploying a separate Prometheus via an OLM Prometheus Operator because it’s not supported/compatible with OpenShift “user-defined monitoring.” [1]

  • Only opt a namespace into cluster monitoring when you explicitly want the platform Prometheus (openshift-monitoring/prometheus-k8s) to scrape it. The namespace label is:

    oc label namespace <operator-namespace> openshift.io/cluster-monitoring="true"

    This label “causes the Cluster Monitoring Operator to scrape the namespace for metrics.” [2] (Many operators/docs use this for their metrics to be collected by the platform stack, e.g., NVIDIA GPU Operator.) [3]

  • Treat openshift.io/cluster-monitoring="true" as an explicit, narrow opt-in (not a default). OLM/console UX is designed so operators can recommend enabling cluster monitoring for their install namespace, but it’s deselected by default so admins deliberately opt in (because it has implications). [4]

  • If using UWM, don’t accidentally exclude your namespace from scraping. UWM respects the namespace label openshift.io/user-monitoring=false as an opt-out. [5]

Rule of thumb:

  • Add-on/operator metrics you want platform Prometheus to scrape → label that operator’s namespace with openshift.io/cluster-monitoring="true". [2]
  • Everything else (most app/add-on metrics) → use User Workload Monitoring instead. [1]

Sources:
[1] OpenShift docs: Enabling monitoring for user-defined projects (UWM) + OLM Prometheus Operator incompatibility. (docs.redhat.com)
[2] Red Hat ACM release notes: openshift.io/cluster-monitoring: "true" makes CMO scrape the namespace. (access.redhat.com)
[3] NVIDIA docs: requires openshift.io/cluster-monitoring=true on the operator namespace for OpenShift Prometheus to collect metrics. (docs.nvidia.com)
[4] OpenShift OLM design: operators can request enabling cluster monitoring; option is shown but deselected by default. (openshift.github.io)
[5] Red Hat Monitoring Stack troubleshooting: openshift.io/user-monitoring=false opt-out label. (docs.redhat.com)

Citations:


@Thealisyed, the web search results add an important nuance that's worth flagging before you get your Slack answer back:


⚠️ For Add-on Operators: Consider User Workload Monitoring (UWM) Instead

The best practice recommendation is to prefer User Workload Monitoring (UWM) for add-on operators that expose application metrics. Enable UWM via cluster-monitoring-config (enableUserWorkload: true) and have the operator create ServiceMonitor/PodMonitor in its own namespace — avoid the cluster monitoring stack for add-on workloads.

You should only opt a namespace into cluster monitoring when you explicitly want the platform Prometheus (openshift-monitoring/prometheus-k8s) to scrape it. The namespace label openshift.io/cluster-monitoring="true" causes the Cluster Monitoring Operator to scrape the namespace for metrics.

This is the core question to validate in your Slack thread: should EDO's operand metrics go through cluster monitoring or UWM?


The openshift-* Namespace Restriction

There's an additional constraint from the enhancement design: the annotation is consumed by the OpenShift Console during install, but the namespace needs to be openshift--prefixed (and is effectively restricted/whitelisted). Since EDO/ALBO are installed into user-created namespaces (not openshift-*), this further complicates the cluster monitoring path.

CIO/CDO work because they're CVO-managed in openshift-* namespaces — exactly the case you identified.


The CSV Annotation Is Still the Opt-In Mechanism (If You Go the Cluster Monitoring Route)

If after your Slack discussion the decision is to use cluster monitoring rather than UWM, then yes — for OLM-installed operators opting into OpenShift cluster monitoring, you add the operatorframework.io/cluster-monitoring: "true" annotation to the CSV metadata. The Console then applies the openshift.io/cluster-monitoring=true label to the install namespace.

The OLM/console UX is designed so operators can recommend enabling cluster monitoring for their install namespace, but it's deselected by default so admins must deliberately opt in.


Good Question for Your Slack Thread

The key question to ask is: "For EDO/ALBO metrics (add-on operators in user namespaces), should we be targeting cluster Prometheus or User Workload Monitoring?" The answer from the team will determine whether the CSV annotation approach is appropriate or whether the ServiceMonitor should simply be placed in the operator's namespace with UWM enabled by the admin separately. Either way, this is good to clarify before finalizing the PR approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants