OCPBUGS-55179: Set internalTrafficPolicy to Local on dns-default service#474
OCPBUGS-55179: Set internalTrafficPolicy to Local on dns-default service#474pkhedeka wants to merge 3 commits into
Conversation
|
@pkhedeka: This pull request references Jira Issue OCPBUGS-55179, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughAdds ChangesDNS Service internalTrafficPolicy
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @pkhedeka. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/jira refresh |
|
@pkhedeka: This pull request references Jira Issue OCPBUGS-55179, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@pkhedeka: This pull request references Jira Issue OCPBUGS-55179, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
DNS queries from pods on primary user-defined networks (UDNs) scatter randomly across dns-default pods on all nodes instead of being handled by the local node's dns-default pod. UDN pods reach the dns-default service (172.30.0.10) through a path that crosses network boundaries: UDN pod -> UDN cluster router -> management port -> default network -> OVN load balancer -> dns-default pod. The OVN load balancer on the default network treats dns-default as a standard ClusterIP service and distributes traffic across all backend pods cluster-wide. Setting internalTrafficPolicy to Local restricts the EndpointSlice to contain only the node-local dns-default backend. Since dns-default runs as a DaemonSet with a pod on every node, this is safe and guarantees that DNS queries are always handled by the local pod. The existing trafficDistribution: PreferSameNode (added in PR openshift#457) provides a soft hint for same-node preference but does not guarantee locality for cross-network UDN traffic. internalTrafficPolicy: Local provides the hard constraint needed. Jira: https://issues.redhat.com/browse/OCPBUGS-55179 Signed-off-by: Parikshit Khedekar <pkhedeka@redhat.com>
|
@pkhedeka: This pull request references Jira Issue OCPBUGS-55179, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This is a false assumption. There are situations in which the dns-default pods do not necessarily run on every node, for example if a node is tainted, if |
| spec: | ||
| # clusterIP will be automatically managed. | ||
| # selector is set at runtime | ||
| internalTrafficPolicy: Local |
There was a problem hiding this comment.
You would also need to update daemonsetConfigChanged in pkg/operator/controller/controller_dns_daemonset.go to ensure existing an daemon set got updated.
There was a problem hiding this comment.
Bit confused here, I assume this shall help me keeping iTP to local.
| if !assert.NotNil(t, svc.Spec.InternalTrafficPolicy, "expected InternalTrafficPolicy to be set") { | ||
| return | ||
| } | ||
| assert.Equal(t, corev1.ServiceInternalTrafficPolicyLocal, *svc.Spec.InternalTrafficPolicy, | ||
| "dns-default service must use InternalTrafficPolicy Local to ensure node-local DNS resolution") |
There was a problem hiding this comment.
Is there a reason to define a new test rather than putting the assertions in TestDesiredDNSDaemonset?
There was a problem hiding this comment.
Didn't se any existing service, likely missed on that , so rather adding thought of creating new one. Can try this, though as you have shared the suggestions above I feel it would need more hands than mine.
| Thank you for suggestions, I agree - with-fallback these tainting/upgrade or even node replacement would turn to issues. Initially I thought of OVN-K based check/fix but couldn't really conclude to that. Now even as I see "trafficDistribution: PreferSameNode" is much needed I feel I shouldn't looking at DNS operator, rather would like to hear your suggestion. Rest, this is around a UDN traffic issue - may be I can try learning something more based on your suggestions.
|
Co-authored-by: Miciah Dashiel Butler Masters <mmasters@redhat.com>
Co-authored-by: Miciah Dashiel Butler Masters <mmasters@redhat.com>
Summary
DNS queries from pods on primary user-defined networks (UDNs) scatter randomly across dns-default pods on all nodes instead of being handled by the local node's dns-default pod.
UDN pods reach the dns-default service (172.30.0.10) through a path that crosses network boundaries: UDN pod → UDN cluster router → management port → default network → OVN load balancer → dns-default pod. The OVN load balancer on the default network treats dns-default as a standard ClusterIP service and distributes traffic across all backend pods cluster-wide.
Setting
internalTrafficPolicy: Localrestricts the EndpointSlice to contain only the node-local dns-default backend. Since dns-default runs as a DaemonSet with a pod on every node, this is safe and guarantees that DNS queries are always handled by the local pod.The existing
trafficDistribution: PreferSameNode(added in PR #457) provides a soft hint for same-node preference but does not guarantee locality for cross-network UDN traffic.internalTrafficPolicy: Localprovides the hard constraint needed.Why DNS operator, not OVN?
The DNS query scatter from primary UDN pods is not caused by an OVN routing bug. In OCP 4.14+ (interconnect mode), each node has its own zone and its own UDN cluster router with a single route — no ECMP happens at the UDN router level. The scatter occurs downstream: once UDN traffic crosses into the default network via the management port, the OVN load balancer distributes dns-default traffic across all backend pods because the service has
internalTrafficPolicy: Cluster(the default). The fix belongs in the DNS operator, which owns the dns-default service spec.Verification
make test— passedmake verify— passedJira: https://issues.redhat.com/browse/OCPBUGS-55179
Summary by CodeRabbit