OCPBUGS-58102: Expose ExternalDNS operand metrics via kube-rbac-proxy sidecar#371
OCPBUGS-58102: Expose ExternalDNS operand metrics via kube-rbac-proxy sidecar#371Thealisyed wants to merge 5 commits into
Conversation
0297cde to
3f8cd03
Compare
|
@Thealisyed: This pull request references Jira Issue OCPBUGS-58102, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/assign |
|
/jira refresh |
|
@Thealisyed: This pull request references Jira Issue OCPBUGS-58102, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
alebedev87
left a comment
There was a problem hiding this comment.
First look. I need to have another one with a focus on the service/servicemonitor comparison logic.
| ownerRef := metav1.NewControllerRef(externalDNS, operatorv1beta1.GroupVersion.WithKind("ExternalDNS")) | ||
| desired.SetOwnerReferences([]metav1.OwnerReference{*ownerRef}) | ||
|
|
||
| current := &unstructured.Unstructured{} |
There was a problem hiding this comment.
Why unstructured object? Did you try to add ServiceMonitor scheme to avoid this?
There was a problem hiding this comment.
We went with unstructured objects to avoid adding github.com/prometheus-operator/prometheus-operator as a direct dependency in go.mod. Pulling in that module brings would increase binary size, maintenance so seemed unnecessary to do so.
The inspiration was drawn from CIO using unstructured for Canary resources. However if you feel the type safety and readability of a typed ServiceMonitor outweighs the dependency cost I can change.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (19)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (18)
📝 WalkthroughWalkthroughAdds kube-rbac-proxy-based HTTPS metrics support: introduces a CLI/config field and default image for the kube-rbac-proxy, RBAC entries for services/ServiceMonitors/tokenreviews/subjectaccessreviews, controller watches for Service and ServiceMonitor, reconciliation to ensure per-zone metrics Service and ServiceMonitor, deployment sidecars with metrics TLS cert volume, naming helpers for metrics artifacts, and unit tests covering the metrics plumbing. Changeskube-rbac-proxy Metrics Support
Sequence DiagramsequenceDiagram
participant Ctrl as Controller
participant Depl as Deployment
participant Svc as Metrics Service
participant SM as ServiceMonitor
participant K8s as Kubernetes API
Ctrl->>Ctrl: Reconcile ExternalDNS
alt KubeRBACProxyImage configured
Ctrl->>Depl: Ensure deployment with kube-rbac-proxy sidecars & cert volume
Depl->>K8s: Create/Update Deployment
K8s-->>Depl: Deployment acknowledged
Ctrl->>Svc: Ensure per-zone metrics Service
Svc->>K8s: Create/Update Service
K8s-->>Svc: Service acknowledged
Ctrl->>SM: Ensure ServiceMonitor (unstructured)
SM->>K8s: Create/Update ServiceMonitor
K8s-->>SM: ServiceMonitor acknowledged
Ctrl->>K8s: Watch Service and ServiceMonitor changes
else No kube-rbac-proxy configured
Ctrl->>Depl: Ensure deployment without sidecars
Depl->>K8s: Create/Update Deployment
end
K8s-->>Ctrl: Reconcile complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.1)level=error msg="Running error: context loading failed: failed to load packages: failed to load packages: failed to load with go/packages: err: exit status 1: stderr: go: inconsistent vendoring in :\n\tgithub.com/Azure/azure-sdk-for-go@v60.1.0+incompatible: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Azure/go-autorest/autorest@v0.11.27: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/Azure/go-autorest/autorest/adal@v0.9.20: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/aws/aws-sdk-go@v1.41.6: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/go-logr/logr@v1.4.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/google/go-cmp@v0.7.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tgithub.com/infobloxopen/in ... [truncated 13756 characters] ... red in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/kustomize/cmd/config@v0.14.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/kustomize/kyaml@v0.17.2: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/randfill@v1.0.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/structured-merge-diff/v4@v4.6.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\tsigs.k8s.io/yaml@v1.4.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt\n\n\tTo ignore the vendor directory, use -mod=readonly or -mod=mod.\n\tTo sync the vendor directory, run:\n\t\tgo mod vendor\n" Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@Thealisyed: This pull request references Jira Issue OCPBUGS-58102, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/operator/operator.go (1)
45-57: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winKeep the kubebuilder RBAC source in sync with the new proxy auth verbs.
The CSV now carries
tokenreviewsandsubjectaccessreviews, but this aggregated annotation block still doesn't. The next manifest regeneration can silently drop those permissions again.Suggested annotation update
// +kubebuilder:rbac:groups=route.openshift.io,resources=routes,verbs=get;watch;list // +kubebuilder:rbac:groups=config.openshift.io,resources=infrastructures,verbs=get;list;watch +// +kubebuilder:rbac:groups=authentication.k8s.io,resources=tokenreviews,verbs=create +// +kubebuilder:rbac:groups=authorization.k8s.io,resources=subjectaccessreviews,verbs=create // local role // +kubebuilder:rbac:groups="",namespace=external-dns-operator,resources=secrets;serviceaccounts;configmaps,verbs=get;list;watch;create;update;patch;delete // +kubebuilder:rbac:groups="",namespace=external-dns-operator,resources=services,verbs=get;list;watch;create;update;patch;delete🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/operator.go` around lines 45 - 57, The aggregated kubebuilder RBAC annotation block in operator.go is missing the proxy auth permissions (tokenreviews and subjectaccessreviews) that are present in the CSV; add RBAC annotations to include those resources so future manifest regenerations don't drop them. Specifically, add a +kubebuilder:rbac line for group=authentication.k8s.io,resources=tokenreviews,verbs=create and a +kubebuilder:rbac line for group=authorization.k8s.io,resources=subjectaccessreviews,verbs=create within the existing annotation block (the block containing the existing +kubebuilder:rbac comments) so the operator has the same proxy auth verbs as the CSV.
🧹 Nitpick comments (1)
pkg/operator/controller/externaldns/deployment_test.go (1)
6002-6002: ⚡ Quick winAdd one positive test path for non-empty kube-rbac-proxy image.
At Line 6002, all test cases pass
""for the new kube-rbac-proxy image parameter, so deployment reconciliation for the enabled sidecar path is not validated here. Add one case with a non-empty image (and OpenShift-enabled conditions) to assert expected sidecar/ports in the resulting Deployment.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/controller/externaldns/deployment_test.go` at line 6002, Add a positive test case in deployment_test.go that calls ensureExternalDNSDeployment with a non-empty kube-rbac-proxy image string (instead of ""), set the test's extDNS/OpenShift-enabled conditions appropriately, and verify the returned Deployment (gotDepl) contains the kube-rbac-proxy container and expected ports/volume mounts; update the table-driven tests to include this scenario and assert gotExist and no error as part of the new case, referencing ensureExternalDNSDeployment, test.OperandNamespace, test.OperandImage, serviceAccount, tc.credSecret, tc.trustCAConfigMap and &tc.extDNS to locate the callsite for modification.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/operator/controller/externaldns/controller.go`:
- Around line 239-248: When kubeRBACProxyImage becomes empty you must delete any
previously created metrics resources; add an else branch for the
kubeRBACProxyImage check that calls deletion helpers to remove the owned Service
and ServiceMonitor (e.g. implement and call
r.deleteExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS) and
r.deleteExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS)), have
those helpers delete the resources and treat NotFound as success, and ensure
owner-label/ownerRef checks are used so only the CR's owned metrics resources
are removed; keep the existing ensureExternalDNSMetricsService and
ensureExternalDNSServiceMonitor logic for the creation path.
In `@pkg/operator/controller/externaldns/deployment.go`:
- Around line 301-310: The code adds a managed metrics-cert volume when
cfg.kubeRBACProxyImage is set but does not remove it when the image is later
unset, leaving the pod template referencing a secret that may not exist; update
the reconciliation logic around cfg.kubeRBACProxyImage (the block that calls
kubeRBACProxyContainer and metricsCertVolume and appends to
depl.Spec.Template.Spec.Volumes) to also prune any operator-managed metrics
volume when the proxy is disabled: detect when cfg.kubeRBACProxyImage == "" and
remove from depl.Spec.Template.Spec.Volumes any Volume whose Name or Secret.Name
matches the managed metrics volume (the one returned by metricsCertVolume and/or
controller.ExternalDNSMetricsSecretName(cfg.externalDNS)), and likewise ensure
any proxy sidecar containers (created by kubeRBACProxyContainer) are removed
when disabling the proxy.
In `@pkg/operator/controller/externaldns/service.go`:
- Around line 129-133: metricsServiceChanged currently compares port Name, Port,
and TargetPort but omits ServicePort.Protocol so protocol drift (set in
desiredMetricsService) won't be detected; update the metricsServiceChanged
function to also compare current.Spec.Ports[i].Protocol !=
desired.Spec.Ports[i].Protocol when iterating desired.Spec.Ports (and ensure the
loop bounds use the same index set), returning true if they differ so
reconciliation will update the Protocol field.
---
Outside diff comments:
In `@pkg/operator/operator.go`:
- Around line 45-57: The aggregated kubebuilder RBAC annotation block in
operator.go is missing the proxy auth permissions (tokenreviews and
subjectaccessreviews) that are present in the CSV; add RBAC annotations to
include those resources so future manifest regenerations don't drop them.
Specifically, add a +kubebuilder:rbac line for
group=authentication.k8s.io,resources=tokenreviews,verbs=create and a
+kubebuilder:rbac line for
group=authorization.k8s.io,resources=subjectaccessreviews,verbs=create within
the existing annotation block (the block containing the existing
+kubebuilder:rbac comments) so the operator has the same proxy auth verbs as the
CSV.
---
Nitpick comments:
In `@pkg/operator/controller/externaldns/deployment_test.go`:
- Line 6002: Add a positive test case in deployment_test.go that calls
ensureExternalDNSDeployment with a non-empty kube-rbac-proxy image string
(instead of ""), set the test's extDNS/OpenShift-enabled conditions
appropriately, and verify the returned Deployment (gotDepl) contains the
kube-rbac-proxy container and expected ports/volume mounts; update the
table-driven tests to include this scenario and assert gotExist and no error as
part of the new case, referencing ensureExternalDNSDeployment,
test.OperandNamespace, test.OperandImage, serviceAccount, tc.credSecret,
tc.trustCAConfigMap and &tc.extDNS to locate the callsite for modification.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 78bfc9c0-0a8d-427a-9118-c5a7267ed0a8
📒 Files selected for processing (18)
bundle/manifests/external-dns-operator.clusterserviceversion.yamlbundle/manifests/external-dns_rbac.authorization.k8s.io_v1_clusterrole.yamlconfig/manager/manager.yamlconfig/rbac/operand_role.yamlconfig/rbac/role.yamlmain.gopkg/operator/config/config.gopkg/operator/controller/externaldns/controller.gopkg/operator/controller/externaldns/deployment.gopkg/operator/controller/externaldns/deployment_test.gopkg/operator/controller/externaldns/pod.gopkg/operator/controller/externaldns/pod_test.gopkg/operator/controller/externaldns/service.gopkg/operator/controller/externaldns/service_test.gopkg/operator/controller/externaldns/servicemonitor.gopkg/operator/controller/externaldns/servicemonitor_test.gopkg/operator/controller/names.gopkg/operator/operator.go
| // Ensure metrics service and service monitor for Prometheus scraping. | ||
| // Owner references on these resources ensure cascade deletion when the ExternalDNS CR is removed. | ||
| if kubeRBACProxyImage != "" { | ||
| if err := r.ensureExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS); err != nil { | ||
| return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS metrics service: %w", err) | ||
| } | ||
| if err := r.ensureExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS); err != nil { | ||
| return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS service monitor: %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Delete the owned metrics resources when metrics exposure is turned off.
When kubeRBACProxyImage becomes empty, this branch is skipped entirely, so any previously created metrics Service and ServiceMonitor remain behind. That leaves stale scrape targets pointing at dead ports, and the disabled watches mean nothing reconciles them away.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/operator/controller/externaldns/controller.go` around lines 239 - 248,
When kubeRBACProxyImage becomes empty you must delete any previously created
metrics resources; add an else branch for the kubeRBACProxyImage check that
calls deletion helpers to remove the owned Service and ServiceMonitor (e.g.
implement and call r.deleteExternalDNSMetricsService(ctx, r.config.Namespace,
externalDNS) and r.deleteExternalDNSServiceMonitor(ctx, r.config.Namespace,
externalDNS)), have those helpers delete the resources and treat NotFound as
success, and ensure owner-label/ownerRef checks are used so only the CR's owned
metrics resources are removed; keep the existing ensureExternalDNSMetricsService
and ensureExternalDNSServiceMonitor logic for the creation path.
There was a problem hiding this comment.
This seems reasonable. kube-rbac-proxy image is configured for the operand in CSV, therefore it's not possible to override it without scaling down OLM operator (unsupported path). I think that we should add the kube-rbac-proxy operand support unconditionally. If it would need to be disabled we need to think about a new API field for this.
There was a problem hiding this comment.
@alebedev87, that's a fair point — since RELATED_IMAGE_KUBE_RBAC_PROXY is baked into the CSV and can only change via an OLM operator update (which restarts the operator anyway), the runtime scenario where kubeRBACProxyImage transitions from a non-empty value back to empty is effectively unreachable. The stale-resource concern doesn't apply in practice given that deployment model.
The idea of a dedicated API field for disabling metrics exposure in the future sounds like the right approach if that need arises. Would you like me to open a follow-up GitHub issue to track that?
📝 Learnings were identified and not saved (knowledge base disabled). Enable
| // Add kube-rbac-proxy sidecar(s) and metrics cert volume for secure metrics exposure. | ||
| // One sidecar per zone container, each proxying the corresponding metrics port. | ||
| if cfg.kubeRBACProxyImage != "" { | ||
| for i := 0; i < cbld.counter; i++ { | ||
| proxyContainer := kubeRBACProxyContainer(cfg.kubeRBACProxyImage, i) | ||
| depl.Spec.Template.Spec.Containers = append(depl.Spec.Template.Spec.Containers, proxyContainer) | ||
| } | ||
| certVolume := metricsCertVolume(controller.ExternalDNSMetricsSecretName(cfg.externalDNS)) | ||
| depl.Spec.Template.Spec.Volumes = append(depl.Spec.Template.Spec.Volumes, certVolume) | ||
| } |
There was a problem hiding this comment.
Prune the managed metrics-cert volume when the proxy is disabled.
This block adds a new operator-managed secret volume, but the update path never removes unexpected volumes. If --kube-rbac-proxy-image is later unset, the sidecars disappear while the pod template can still reference metrics-cert, leaving the rollout dependent on a secret that is no longer created.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/operator/controller/externaldns/deployment.go` around lines 301 - 310,
The code adds a managed metrics-cert volume when cfg.kubeRBACProxyImage is set
but does not remove it when the image is later unset, leaving the pod template
referencing a secret that may not exist; update the reconciliation logic around
cfg.kubeRBACProxyImage (the block that calls kubeRBACProxyContainer and
metricsCertVolume and appends to depl.Spec.Template.Spec.Volumes) to also prune
any operator-managed metrics volume when the proxy is disabled: detect when
cfg.kubeRBACProxyImage == "" and remove from depl.Spec.Template.Spec.Volumes any
Volume whose Name or Secret.Name matches the managed metrics volume (the one
returned by metricsCertVolume and/or
controller.ExternalDNSMetricsSecretName(cfg.externalDNS)), and likewise ensure
any proxy sidecar containers (created by kubeRBACProxyContainer) are removed
when disabling the proxy.
| for i := range desired.Spec.Ports { | ||
| if current.Spec.Ports[i].Name != desired.Spec.Ports[i].Name || | ||
| current.Spec.Ports[i].Port != desired.Spec.Ports[i].Port || | ||
| current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort { | ||
| return true |
There was a problem hiding this comment.
Include ServicePort.Protocol in drift detection.
desiredMetricsService explicitly sets Protocol (Line 48), but metricsServiceChanged never compares it. If protocol drifts, reconcile won’t repair it.
Suggested patch
for i := range desired.Spec.Ports {
if current.Spec.Ports[i].Name != desired.Spec.Ports[i].Name ||
current.Spec.Ports[i].Port != desired.Spec.Ports[i].Port ||
- current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort {
+ current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort ||
+ current.Spec.Ports[i].Protocol != desired.Spec.Ports[i].Protocol {
return true
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for i := range desired.Spec.Ports { | |
| if current.Spec.Ports[i].Name != desired.Spec.Ports[i].Name || | |
| current.Spec.Ports[i].Port != desired.Spec.Ports[i].Port || | |
| current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort { | |
| return true | |
| for i := range desired.Spec.Ports { | |
| if current.Spec.Ports[i].Name != desired.Spec.Ports[i].Name || | |
| current.Spec.Ports[i].Port != desired.Spec.Ports[i].Port || | |
| current.Spec.Ports[i].TargetPort != desired.Spec.Ports[i].TargetPort || | |
| current.Spec.Ports[i].Protocol != desired.Spec.Ports[i].Protocol { | |
| return true |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/operator/controller/externaldns/service.go` around lines 129 - 133,
metricsServiceChanged currently compares port Name, Port, and TargetPort but
omits ServicePort.Protocol so protocol drift (set in desiredMetricsService)
won't be detected; update the metricsServiceChanged function to also compare
current.Spec.Ports[i].Protocol != desired.Spec.Ports[i].Protocol when iterating
desired.Spec.Ports (and ensure the loop bounds use the same index set),
returning true if they differ so reconciliation will update the Protocol field.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/operator/controller/externaldns/controller.go (1)
230-239:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDelete the metrics
ServiceandServiceMonitorwhen proxying is disabled.If
KubeRBACProxyImageis cleared, Line 225 stops adding the sidecar to the deployment, but this block also stops reconciling the previously created metrics resources. That leaves stale scrape targets behind until theExternalDNSCR itself is deleted.Suggested fix
if r.config.KubeRBACProxyImage != "" { if err := r.ensureExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS); err != nil { return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS metrics service: %w", err) } if err := r.ensureExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS); err != nil { return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS service monitor: %w", err) } + } else { + if err := r.deleteExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS); err != nil { + return reconcile.Result{}, fmt.Errorf("failed to delete externalDNS metrics service: %w", err) + } + if err := r.deleteExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS); err != nil { + return reconcile.Result{}, fmt.Errorf("failed to delete externalDNS service monitor: %w", err) + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/operator/controller/externaldns/controller.go` around lines 230 - 239, When r.config.KubeRBACProxyImage is empty we must remove any previously-created metrics resources instead of skipping reconciliation; update the controller logic around ensureExternalDNSMetricsService and ensureExternalDNSServiceMonitor to call deletion paths when r.config.KubeRBACProxyImage == "" (e.g., invoke functions to delete the Service and ServiceMonitor corresponding to externalDNS) so that stale scrape targets are removed; locate the block using r.config.KubeRBACProxyImage, externalDNS, ensureExternalDNSMetricsService and ensureExternalDNSServiceMonitor and implement/ call corresponding deleteExternalDNSMetricsService and deleteExternalDNSServiceMonitor (or add an ensureDeleted flag to those methods) and handle errors consistently (return wrapped errors if deletion fails).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pkg/operator/controller/externaldns/controller.go`:
- Around line 230-239: When r.config.KubeRBACProxyImage is empty we must remove
any previously-created metrics resources instead of skipping reconciliation;
update the controller logic around ensureExternalDNSMetricsService and
ensureExternalDNSServiceMonitor to call deletion paths when
r.config.KubeRBACProxyImage == "" (e.g., invoke functions to delete the Service
and ServiceMonitor corresponding to externalDNS) so that stale scrape targets
are removed; locate the block using r.config.KubeRBACProxyImage, externalDNS,
ensureExternalDNSMetricsService and ensureExternalDNSServiceMonitor and
implement/ call corresponding deleteExternalDNSMetricsService and
deleteExternalDNSServiceMonitor (or add an ensureDeleted flag to those methods)
and handle errors consistently (return wrapped errors if deletion fails).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 860161f6-d6cd-45d3-999c-a823ea5e85b2
📒 Files selected for processing (1)
pkg/operator/controller/externaldns/controller.go
| metadata: | ||
| labels: | ||
| name: external-dns-operator | ||
| openshift.io/cluster-monitoring: "true" |
There was a problem hiding this comment.
Note that this manifest is not added to the bundle. The creation of the namespace is taken care either by the OLM operator (web console) or by the user.
Did you add it to make sure metrics are scraped by built in Prometheus stack? Operators metrics worked without this label. Can you double check that the all is scraped well if the namespace is created via web console or manually?
| // Ensure metrics service and service monitor for Prometheus scraping. | ||
| // Owner references on these resources ensure cascade deletion when the ExternalDNS CR is removed. | ||
| if kubeRBACProxyImage != "" { | ||
| if err := r.ensureExternalDNSMetricsService(ctx, r.config.Namespace, externalDNS); err != nil { | ||
| return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS metrics service: %w", err) | ||
| } | ||
| if err := r.ensureExternalDNSServiceMonitor(ctx, r.config.Namespace, externalDNS); err != nil { | ||
| return reconcile.Result{}, fmt.Errorf("failed to ensure externalDNS service monitor: %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
This seems reasonable. kube-rbac-proxy image is configured for the operand in CSV, therefore it's not possible to override it without scaling down OLM operator (unsupported path). I think that we should add the kube-rbac-proxy operand support unconditionally. If it would need to be disabled we need to think about a new API field for this.
|
Did we resolve and answer all of Andreys comments and are you happy now? @coderabbitai |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me fetch the review comments from the PR to check the status of Andrey's feedback. 🏁 Script executed: Length of output: 34815 Let me dig deeper into the specific inline comments from Andrey's latest review round to get the full picture. 🏁 Script executed: Length of output: 13931
✅ Andrey's Resolved Threads (10/12)
❌ Still Open ThreadsTwo of Andrey's threads remain unresolved:
Three of my (CodeRabbit) threads are also still open:
The key items needing your attention before approval are likely Andrey's two unresolved threads, especially the |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/operator/controller/externaldns/deployment.go`:
- Around line 301-308: The code currently skips adding kube-rbac-proxy sidecars
when cfg.kubeRBACProxyImage == "" but still creates metrics
Service/ServiceMonitor; change this to fail fast: check cfg.kubeRBACProxyImage
at the start of the deployment creation/reconcile path and return an error if
it's empty (e.g., fmt.Errorf("kubeRBACProxyImage must be set")), instead of
silently omitting the proxy; update the code surrounding the
kubeRBACProxyContainer(...) and
metricsCertVolume(controller.ExternalDNSMetricsSecretName(...)) calls so they
only run after the non-empty validation and ensure the calling reconcile loop
surfaces/handles the returned error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d998c4dc-57be-4b48-8f4a-b7509fa7e180
📒 Files selected for processing (4)
config/manager/manager.yamldocs/openshift.mdpkg/operator/controller/externaldns/controller.gopkg/operator/controller/externaldns/deployment.go
✅ Files skipped from review due to trivial changes (1)
- docs/openshift.md
| if cfg.kubeRBACProxyImage != "" { | ||
| for i := 0; i < cbld.counter; i++ { | ||
| proxyContainer := kubeRBACProxyContainer(cfg.kubeRBACProxyImage, i) | ||
| depl.Spec.Template.Spec.Containers = append(depl.Spec.Template.Spec.Containers, proxyContainer) | ||
| } | ||
| certVolume := metricsCertVolume(controller.ExternalDNSMetricsSecretName(cfg.externalDNS)) | ||
| depl.Spec.Template.Spec.Volumes = append(depl.Spec.Template.Spec.Volumes, certVolume) | ||
| } |
There was a problem hiding this comment.
Fail fast when the proxy image is unset.
Lines 301-307 silently skip the kube-rbac-proxy sidecars, but the reconcile path now always creates the metrics Service and ServiceMonitor. That leaves broken scrape targets instead of a clear config error. Either make the proxy unconditional here as well, or return an error when cfg.kubeRBACProxyImage is empty.
Proposed fix
- if cfg.kubeRBACProxyImage != "" {
- for i := 0; i < cbld.counter; i++ {
- proxyContainer := kubeRBACProxyContainer(cfg.kubeRBACProxyImage, i)
- depl.Spec.Template.Spec.Containers = append(depl.Spec.Template.Spec.Containers, proxyContainer)
- }
- certVolume := metricsCertVolume(controller.ExternalDNSMetricsSecretName(cfg.externalDNS))
- depl.Spec.Template.Spec.Volumes = append(depl.Spec.Template.Spec.Volumes, certVolume)
- }
+ if cfg.kubeRBACProxyImage == "" {
+ return nil, fmt.Errorf("kube-rbac-proxy image must be configured")
+ }
+ for i := 0; i < cbld.counter; i++ {
+ proxyContainer := kubeRBACProxyContainer(cfg.kubeRBACProxyImage, i)
+ depl.Spec.Template.Spec.Containers = append(depl.Spec.Template.Spec.Containers, proxyContainer)
+ }
+ certVolume := metricsCertVolume(controller.ExternalDNSMetricsSecretName(cfg.externalDNS))
+ depl.Spec.Template.Spec.Volumes = append(depl.Spec.Template.Spec.Volumes, certVolume)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/operator/controller/externaldns/deployment.go` around lines 301 - 308,
The code currently skips adding kube-rbac-proxy sidecars when
cfg.kubeRBACProxyImage == "" but still creates metrics Service/ServiceMonitor;
change this to fail fast: check cfg.kubeRBACProxyImage at the start of the
deployment creation/reconcile path and return an error if it's empty (e.g.,
fmt.Errorf("kubeRBACProxyImage must be set")), instead of silently omitting the
proxy; update the code surrounding the kubeRBACProxyContainer(...) and
metricsCertVolume(controller.ExternalDNSMetricsSecretName(...)) calls so they
only run after the non-empty validation and ensure the calling reconcile loop
surfaces/handles the returned error.
ExternalDNS operand pods bind metrics to 127.0.0.1, making them inaccessible from outside the pod. Prometheus cannot scrape them. This adds a kube-rbac-proxy sidecar to each operand deployment that proxies the localhost metrics over HTTPS, along with a Service (annotated for OpenShift serving cert) and ServiceMonitor for automatic Prometheus scraping. Resolves: OCPBUGS-58102 Assisted with Claude
- Gate metrics sidecar/service/servicemonitor behind IsOpenShift (serving-cert dependency) - Add Service and ServiceMonitor watches with KubeRBACProxyImage+IsOpenShift guard - Switch ServiceMonitor ownerRef to controllerutil.SetControllerReference - Remove manual delete logic; owner references handle cascade deletion - Add kubeRBACProxyContainerNameForSeq for naming consistency - Rename metricsService to desiredMetricsService - Add labels/annotations comparison to metricsServiceChanged and serviceMonitorChanged - Sync labels in ServiceMonitor update flow - Preserve immutable Service fields (ClusterIPs, IPFamilies, IPFamilyPolicy) on update - Clarify kube-rbac-proxy flag description for ExternalDNS operand - Add manager.yaml env var and namespace label for cluster monitoring - Sync bundle manifests via make bundle Assisted-by: Claude <noreply@anthropic.com>
This operator exclusively targets OpenShift, so gating the metrics sidecar, Service, and ServiceMonitor behind IsOpenShift was redundant. They are now gated only on KubeRBACProxyImage being set. Assisted-by: Claude <noreply@anthropic.com>
- Remove KubeRBACProxyImage guards from controller watches and reconcile; the sidecar, Service, and ServiceMonitor are now always created - Remove openshift.io/cluster-monitoring label from manager.yaml since this manifest is not in the bundle - Add operand metrics documentation to docs/openshift.md covering multi-zone metric differentiation via ports and instance labels Assisted-by: Claude <noreply@anthropic.com>
The controller unconditionally creates metrics Service and ServiceMonitor, so a missing kube-rbac-proxy image would produce broken scrape targets. Validate at operator startup to catch this misconfiguration before any reconciliation runs. Assisted-by: Claude <noreply@anthropic.com>
10a004a to
8848be6
Compare
|
@Thealisyed: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
did we resolve / answer Andreys comment and are you happy now? @coderabbitai |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 9540
Andrey's Threads: 10/12 Resolved ✅Two threads are still unresolved:
For thread For thread
My (CodeRabbit) Open Threads: 4
The two outdated CodeRabbit threads ( |
|
In regards to thread # 2 |
|
Tip For best results, initiate chat on the files or code changes.
How the
|
|
Tip For best results, initiate chat on the files or code changes.
How
|
|
Posted this question in our slack channel, lets see what they say for this @coderabbitai Working on PR for EDO (adding a ServiceMonitor for ExternalDNS operand metrics). The namespace needs openshift.io/cluster-monitoring: "true" for cluster Prometheus to discover it. Other OLM operators use the operatorframework.io/cluster-monitoring: "true" CSV annotation whereas our CSV doesn't have it. Is this something to consider for our add in operators? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain
How the Mechanism Works for OLM Add-on OperatorsThe Crucially, OLM itself will not be responsible for fulfilling the Namespace and RBAC requirements — instead, OLM defines how operator authors can provide the OpenShift Console with the information required to fulfill these requirements. It's an Opt-In Checkbox in the ConsoleOne important nuance: operator developers can suggest that admins allow the namespace to be monitored, but this option is presented deselected by default so that the admin can deliberately opt in. So even with the CSV annotation, it's not fully automatic during console install — the admin must check the box. This is consistent with what the OpenShift Logging docs say: the console option sets the The Regulation AspectIf OLM allows any operator to report metrics to the OpenShift Monitoring Prometheus instance, there is a chance the instance could be overloaded or the integrity of the data could be jeopardized. To minimize this risk, the number of operators granted permission to include metrics is highly regulated — operators must not be added to officially supported CatalogSources without first being reviewed. This is worth keeping in mind — adding this annotation to the EDO CSV may need to go through a review/approval process within Red Hat before it can be included in the supported catalog. Summary / Recommended PathYour Slack question has it right. The recommended approach for OLM add-on operators (EDO, ALBO, etc.) is:
This is all separate from the current PR's scope. Would you like me to open a follow-up GitHub issue to track the 🌐 Web query: 💡 Result: To have an OLM-installed Operator opt into OpenShift cluster monitoring (the
metadata:
annotations:
operatorframework.io/cluster-monitoring: "true"This annotation is consumed by the OpenShift Console during install. It indicates the Operator wants the install namespace to be made “cluster-monitorable.” [1], [2]
oc label namespace <ns> openshift.io/cluster-monitoring=trueOpenShift’s enhancement design also calls out that the namespace must be an Notes:
Sources: 🌐 Web query: 💡 Result: Best practice (OpenShift OLM add-on Operators + monitoring namespace labels):
Rule of thumb:
Sources: Citations:
|
The ExternalDNS operand binds its Prometheus metrics to localhost (127.0.0.1:7979), making them invisible to the cluster monitoring stack. This adds a kube-rbac-proxy sidecar to proxy metrics over HTTPS, along with a Service and ServiceMonitor for Prometheus discovery.
Assisted-by: Claude