OCPBUGS-77532: Fix premature node schedulability marking in taint checking#470
OCPBUGS-77532: Fix premature node schedulability marking in taint checking#470davidesalerno wants to merge 2 commits into
Conversation
…lity OCPBUGS-77532: The tolerationsTolerateTaints function incorrectly returned true when any single taint was tolerated by any toleration. This caused nodes with multiple taints to be marked as schedulable even when only one taint was tolerated, which could lead to DNS pods being scheduled on nodes where they cannot actually run. The fix ensures every taint on a node must be tolerated by at least one toleration before the node is considered schedulable, matching standard Kubernetes scheduling semantics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Davide Salerno <dsalerno@redhat.com>
Cover the key scenarios for taint-toleration matching: no taints, single taint matched/unmatched, multiple taints with all/some/none tolerated, and wildcard toleration. These tests verify the fix for OCPBUGS-77532 and prevent regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Davide Salerno <dsalerno@redhat.com>
|
Skipping CI for Draft Pull Request. |
|
@davidesalerno: This pull request references Jira Issue OCPBUGS-77532, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughUpdated the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 9 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@davidesalerno: This pull request references Jira Issue OCPBUGS-77532, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/operator/controller/controller_dns_daemonset_test.go (1)
891-899: Optional: add an explicitPreferNoSchedulecase to lock intended behavior.Current cases focus on
NoSchedule/NoExecute. Adding onePreferNoSchedulecase would document whether this helper should treat it as strictly untolerated or still acceptable for “safe update” purposes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/operator/controller/controller_dns_daemonset_test.go` around lines 891 - 899, Add a new test case in controller_dns_daemonset_test.go alongside the existing cases (the table that includes the case named "wildcard toleration tolerates all taints") that exercises a taint with Effect: corev1.TaintEffectPreferNoSchedule and a toleration set that should indicate how the helper treats PreferNoSchedule for safe updates; specify the tolerations and taints explicitly and add the expected boolean (true if PreferNoSchedule should be considered tolerated for safe updates, or false if it should be treated as untolerated), so the intended behavior for PreferNoSchedule is documented and enforced by the test harness that uses the same test helpers and assertions as the surrounding cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/operator/controller/controller_dns_daemonset_test.go`:
- Around line 891-899: Add a new test case in controller_dns_daemonset_test.go
alongside the existing cases (the table that includes the case named "wildcard
toleration tolerates all taints") that exercises a taint with Effect:
corev1.TaintEffectPreferNoSchedule and a toleration set that should indicate how
the helper treats PreferNoSchedule for safe updates; specify the tolerations and
taints explicitly and add the expected boolean (true if PreferNoSchedule should
be considered tolerated for safe updates, or false if it should be treated as
untolerated), so the intended behavior for PreferNoSchedule is documented and
enforced by the test harness that uses the same test helpers and assertions as
the surrounding cases.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 12ac8b68-358c-4189-a12f-696ea67649b5
📒 Files selected for processing (2)
pkg/operator/controller/controller_dns_daemonset.gopkg/operator/controller/controller_dns_daemonset_test.go
|
@davidesalerno: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/assign |
|
/assign @grzpiotrowski |
Summary
tolerationsTolerateTaints()to require all taints on a node to be tolerated (not just any one) before considering the node schedulableContext
The
tolerationsTolerateTaintsfunction indaemonsetUpdateIsSafewas returningtrueas soon as it found any single taint tolerated by any toleration. This meant a node with multiple taints (e.g.node-role.kubernetes.io/masterandnode.kubernetes.io/disk-pressure) would be considered schedulable even if only the master taint was tolerated, violating standard Kubernetes scheduling semantics.The fix ensures every taint on a node must be tolerated by at least one toleration before returning
true.See also: #459 (review)
Test plan
tolerationsTolerateTaintscovering:make buildpassesmake testpassesmake verifypasses🤖 Assisted by Claude Code via
/jira:solve OCPBUGS-77532Summary by CodeRabbit
Release Notes
Bug Fixes
Tests