Skip to content

MON-4500: Migrate Prometheus targets discovering from Endpoints to EndpointSlices#460

Open
machine424 wants to merge 2 commits into
openshift:masterfrom
machine424:endpoinmig
Open

MON-4500: Migrate Prometheus targets discovering from Endpoints to EndpointSlices#460
machine424 wants to merge 2 commits into
openshift:masterfrom
machine424:endpoinmig

Conversation

@machine424
Copy link
Copy Markdown

This PR migrates Prometheus service discovery from the deprecated Endpoints API to the EndpointSlices API, by:

  • Setting serviceDiscoveryRole: EndpointSlice on ServiceMonitors.
  • Granting Prometheus endpointslices permissions.

We're taking a conservative approach by keeping the existing endpoints permissions alongside the new endpointslices ones. This provides a safety net in case any ServiceMonitors, whether deployed from this repo or from another source, still rely on the same Role and were missed during the migration.

That said, since both resources provide essentially the same data, keeping both isn't meaningfully more permissive from a security standpoint.

These changes target OpenShift 4.22+ and should not be backported to earlier releases.

Due to the scope of changes across multiple repositories, these modifications were generated with Claude assistance.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 9, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Feb 9, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@machine424 machine424 marked this pull request as ready for review February 9, 2026 12:18
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 9, 2026
@openshift-ci openshift-ci Bot requested review from bentito and knobunc February 9, 2026 12:28
@machine424 machine424 changed the title Migrate Prometheus targets discovering from Endpoints to EndpointSlices MON-4500: Migrate Prometheus targets discovering from Endpoints to EndpointSlices Feb 9, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 9, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Feb 9, 2026

@machine424: This pull request references MON-4500 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR migrates Prometheus service discovery from the deprecated Endpoints API to the EndpointSlices API, by:

  • Setting serviceDiscoveryRole: EndpointSlice on ServiceMonitors.
  • Granting Prometheus endpointslices permissions.

We're taking a conservative approach by keeping the existing endpoints permissions alongside the new endpointslices ones. This provides a safety net in case any ServiceMonitors, whether deployed from this repo or from another source, still rely on the same Role and were missed during the migration.

That said, since both resources provide essentially the same data, keeping both isn't meaningfully more permissive from a security standpoint.

These changes target OpenShift 4.22+ and should not be backported to earlier releases.

Due to the scope of changes across multiple repositories, these modifications were generated with Claude assistance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@machine424
Copy link
Copy Markdown
Author

/verified by existing tests
/jira refresh

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 9, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@machine424: This PR has been marked as verified by existing tests.

Details

In response to this:

/verified by existing tests
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Mar 9, 2026

@machine424: This pull request references MON-4500 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

/verified by existing tests
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@machine424
Copy link
Copy Markdown
Author

/verified by existing tests
/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@machine424: This PR has been marked as verified by existing tests.

Details

In response to this:

/verified by existing tests
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Mar 9, 2026

@machine424: This pull request references MON-4500 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

/verified by existing tests
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@alebedev87
Copy link
Copy Markdown
Contributor

/assign @rikatz

@rikatz
Copy link
Copy Markdown
Member

rikatz commented Mar 11, 2026

@machine424 you need to apply some extra changes:

Please add unit tests for these changes as well.

Thanks!

@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Mar 18, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a1740a5-ae07-43aa-a97d-978f7da5b990

📥 Commits

Reviewing files that changed from the base of the PR and between 90d93ab and 454182e.

📒 Files selected for processing (9)
  • manifests/0000_70_dns-operator_00-cluster-role.yaml
  • manifests/0000_90_dns-operator_00_prometheusrole.yaml
  • manifests/0000_90_dns-operator_02_servicemonitor.yaml
  • pkg/manifests/assets/dns/metrics/role.yaml
  • pkg/operator/controller/controller.go
  • pkg/operator/controller/controller_metrics_role.go
  • pkg/operator/controller/controller_metrics_role_test.go
  • pkg/operator/controller/controller_service_monitor.go
  • pkg/operator/controller/controller_service_monitor_test.go

Walkthrough

The pull request adds EndpointSlice monitoring support to the DNS operator. Changes include extending RBAC permissions across cluster-level and role-based rules to grant discovery access to EndpointSlices, implementing metrics role reconciliation logic in the controller, and updating ServiceMonitor configuration to specify EndpointSlice service discovery.

Changes

Cohort / File(s) Summary
RBAC Rules for EndpointSlices
manifests/0000_70_dns-operator_00-cluster-role.yaml, manifests/0000_90_dns-operator_00_prometheusrole.yaml, pkg/manifests/assets/dns/metrics/role.yaml
Adds permissions to discovery.k8s.io resources (endpointslices) with get, list, and watch verbs across cluster role, Prometheus role, and metrics role.
Metrics Role Reconciliation
pkg/operator/controller/controller.go, pkg/operator/controller/controller_metrics_role.go, pkg/operator/controller/controller_metrics_role_test.go
Introduces four functions (ensureDNSMetricsRole, currentDNSMetricsRole, updateDNSMetricsRole, dnsMetricsRoleChanged) to manage metrics role creation and updates with diff-based comparison; refactors controller.go to delegate role provisioning to the new helper.
ServiceMonitor Configuration
manifests/0000_90_dns-operator_02_servicemonitor.yaml, pkg/operator/controller/controller_service_monitor.go, pkg/operator/controller/controller_service_monitor_test.go
Adds serviceDiscoveryRole field set to "EndpointSlice" in ServiceMonitor spec and extends test coverage to validate detection of this field change.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~18 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 18, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rikatz. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…rviceDiscoveryRole: EndpointSlice in ServiceMonitors

This PR migrates Prometheus service discovery from the deprecated Endpoints API to the EndpointSlices API, by:

    Setting serviceDiscoveryRole: EndpointSlice on ServiceMonitors.
    Granting Prometheus endpointslices permissions.

We're taking a conservative approach by keeping the existing endpoints permissions alongside the new endpointslices ones. This provides a safety net in case any ServiceMonitors, whether deployed from this repo or from another source, still rely on the same Role and were missed during the migration.

That said, since both resources provide essentially the same data, keeping both isn't meaningfully more permissive from a security standpoint.

These changes target OpenShift 4.22+ and should not be backported to earlier releases.

Due to the scope of changes across multiple repositories, these modifications were generated with Claude assistance.
- rbac.authorization.k8s.io
resources:
- clusterroles
- roles
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allows the operator to update pkg/manifests/assets/dns/metrics/role.yaml on the cluster

@machine424
Copy link
Copy Markdown
Author

Pushed some changes, keeping an eye on the CI.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 18, 2026

@machine424: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

rbacv1 "k8s.io/api/rbac/v1"
)

func TestDNSggMetricsRoleChanged(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DNSgg?

@@ -0,0 +1,26 @@
package controller
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- get
- list
- watch
- apiGroups:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to add these same permissions to the ClusterRole, otherwise the user running openshift-dns-operator won´t be able to give this permission.

See:

	ERROR	Reconciler error	{"controller": "dns_controller", "object": {"name":"default"}, "namespace": "", "name": "default", "reconcileID": "edd6c3b1-ee05-40eb-a566-6a83471fb643", "error": "failed to ensure dns default: failed to integrate metrics with openshift-monitoring for dns default: failed to ensure dns metrics role for default: failed to update dns metrics role openshift-dns/prometheus-k8s: roles.rbac.authorization.k8s.io \"prometheus-k8s\" is forbidden: user \"system:serviceaccount:openshift-dns-operator:dns-operator\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:openshift-dns-operator\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"discovery.k8s.io\"], Resources:[\"endpointslices\"], Verbs:[\"get\"]}", "errorCauses": [{"error": "failed to ensure dns default: failed to integrate metrics with openshift-monitoring for dns default: failed to ensure dns metrics role for default: failed to update dns metrics role openshift-dns/prometheus-k8s: roles.rbac.authorization.k8s.io \"prometheus-k8s\" is forbidden: user \"system:serviceaccount:openshift-dns-operator:dns-operator\" (groups=[\"system:serviceaccounts\" \"system:serviceaccounts:openshift-dns-operator\" \"system:authenticated\"]) is attempting to grant RBAC permissions not currently held:\n{APIGroups:[\"discovery.k8s.io\"], Resources:[\"endpointslices\"], Verbs:[\"get\"]}"}]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants