Skip to content

[WIP] MG-108: evals for plan_mustgather#162

Draft
swghosh wants to merge 1 commit intoopenshift:mainfrom
swghosh:mg-evals
Draft

[WIP] MG-108: evals for plan_mustgather#162
swghosh wants to merge 1 commit intoopenshift:mainfrom
swghosh:mg-evals

Conversation

@swghosh
Copy link
Copy Markdown
Member

@swghosh swghosh commented Feb 27, 2026

The openshift toolset contains plan_mustgather ServerPrompt,
add evals for various arguments that the prompt inputs.

Summary by CodeRabbit

  • Tests
    • Added seven new OpenShift must-gather evaluation tasks covering various scenarios including audit log collection, custom images, custom namespaces, host network configuration, node selection, and timeout configurations.
    • Enhanced evaluation configuration with OpenShift-focused task set to validate must-gather planning and resource creation capabilities.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Feb 27, 2026

@swghosh: This pull request references MG-108 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

The openshift toolset contains plan_mustgather ServerPrompt,
add evals for various arguments that the prompt inputs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Feb 27, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 27, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Feb 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: swghosh
Once this PR has been reviewed and has the lgtm label, please assign manusa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

This pull request introduces OpenShift evaluation tasks for must-gather operations and updates the evaluation configuration. Seven new task manifests are added to test various must-gather configurations and constraints, with corresponding eval.yaml configuration to register the new task set.

Changes

Cohort / File(s) Summary
Evaluation Configuration
evals/claude-code/eval.yaml
Added OpenShift task set targeting ../tasks/openshift/*/*.yaml with assertions for plan_mustgather prompt and resource_create_or_update tool, constrained to 1-15 tool calls.
OpenShift Must-Gather Tasks
evals/tasks/openshift/plan-mustgather-*/plan-mustgather-*.yaml
Added seven new Task manifests for must-gather evaluation: audit-logs, custom-images, custom-namespace, default, host-network, node-selector, and timeout-since. Each task follows setup-verify-cleanup pattern with namespace/pod verification and assertions for prompt and tool usage (1-15 calls).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Eight new paths for OpenShift's must-gather dance,
Setup, verify, cleanup—giving clusters a chance,
With audits, with images, with nodes all in place,
This rabbit has built a most methodical space! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding evaluation tasks for the plan_mustgather functionality in OpenShift. It is clear, specific, and directly related to the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
@swghosh swghosh marked this pull request as ready for review April 23, 2026 07:47
@openshift-ci openshift-ci Bot requested review from Kaustubh-pande and manusa April 23, 2026 07:47
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 23, 2026

@swghosh: This pull request references MG-108 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

The openshift toolset contains plan_mustgather ServerPrompt,
add evals for various arguments that the prompt inputs.

Summary by CodeRabbit

  • Tests
  • Added seven new OpenShift must-gather evaluation tasks covering various scenarios including audit log collection, custom images, custom namespaces, host network configuration, node selection, and timeout configurations.
  • Enhanced evaluation configuration with OpenShift-focused task set to validate must-gather planning and resource creation capabilities.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@evals/tasks/openshift/plan-mustgather-custom-images/plan-mustgather-custom-images.yaml`:
- Around line 27-28: The verification only checks for
"cluster-logging-rhel9-operator" and misses "ose-must-gather:v4.15"; update the
pod inspection for the commands that use kubectl get pod "$POD" -n "$NS" -o yaml
| grep -q "cluster-logging-rhel9-operator" (and the duplicate at the later
occurrence) to also verify the presence of the ose-must-gather:v4.15
image—either by adding a second grep check for "ose-must-gather:v4.15" after the
existing command or replacing the single grep with a combined check/regex that
ensures both "cluster-logging-rhel9-operator:latest" and "ose-must-gather:v4.15"
are present.

In
`@evals/tasks/openshift/plan-mustgather-node-selector/plan-mustgather-node-selector.yaml`:
- Around line 20-26: The current extraction stores possibly multiple pod JSON
objects into POD_JSON which concatenates objects and later pipelines into jq for
NS and POD causing parse errors; change the jq pipeline that sets POD_JSON to
produce a single JSON array (use jq -s or wrap items[] selection into an array)
or restrict the selection to the first matching item so subsequent jq calls that
read POD_JSON (used to set NS and POD) receive valid JSON; update the POD_JSON
assignment and/or the NS and POD extraction logic that references POD_JSON to
use jq -s or array indexing to safely handle multiple matches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 70354532-44cb-4719-910d-f7f17cd3eed9

📥 Commits

Reviewing files that changed from the base of the PR and between 3984b5b and aafae6a.

📒 Files selected for processing (8)
  • evals/claude-code/eval.yaml
  • evals/tasks/openshift/plan-mustgather-audit-logs/plan-mustgather-audit-logs.yaml
  • evals/tasks/openshift/plan-mustgather-custom-images/plan-mustgather-custom-images.yaml
  • evals/tasks/openshift/plan-mustgather-custom-namespace/plan-mustgather-custom-namespace.yaml
  • evals/tasks/openshift/plan-mustgather-default/plan-mustgather-default.yaml
  • evals/tasks/openshift/plan-mustgather-host-network/plan-mustgather-host-network.yaml
  • evals/tasks/openshift/plan-mustgather-node-selector/plan-mustgather-node-selector.yaml
  • evals/tasks/openshift/plan-mustgather-timeout-since/plan-mustgather-timeout-since.yaml

Comment on lines +27 to +28
# Verify pod uses the logging operator image
kubectl get pod "$POD" -n "$NS" -o yaml | grep -q "cluster-logging-rhel9-operator"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verification only checks one of the two required images.

The prompt specifies two images should be configured (ose-must-gather:v4.15 and cluster-logging-rhel9-operator:latest), but the verification only checks for the logging operator image. This could lead to false positives if only one image is configured.

Consider verifying both images
       # Verify pod uses the logging operator image
-      kubectl get pod "$POD" -n "$NS" -o yaml | grep -q "cluster-logging-rhel9-operator"
+      POD_YAML=$(kubectl get pod "$POD" -n "$NS" -o yaml)
+      # Verify pod uses both the platform and logging operator images
+      echo "$POD_YAML" | grep -q "ose-must-gather"
+      echo "$POD_YAML" | grep -q "cluster-logging-rhel9-operator"

Also applies to: 36-37

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@evals/tasks/openshift/plan-mustgather-custom-images/plan-mustgather-custom-images.yaml`
around lines 27 - 28, The verification only checks for
"cluster-logging-rhel9-operator" and misses "ose-must-gather:v4.15"; update the
pod inspection for the commands that use kubectl get pod "$POD" -n "$NS" -o yaml
| grep -q "cluster-logging-rhel9-operator" (and the duplicate at the later
occurrence) to also verify the presence of the ose-must-gather:v4.15
image—either by adding a second grep check for "ose-must-gather:v4.15" after the
existing command or replacing the single grep with a combined check/regex that
ensures both "cluster-logging-rhel9-operator:latest" and "ose-must-gather:v4.15"
are present.

Comment on lines +20 to +26
POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential JSON parsing failure when multiple pods match.

When multiple pods match the namespace prefix filter, jq -r '.items[] | select(...)' outputs multiple JSON objects concatenated (not as an array). Piping $POD_JSON back into jq on lines 25-26 will fail with a parse error because it's not valid JSON when there are multiple objects.

Consider using jq -s (slurp) to handle multiple objects, or filter to an array from the start:

Proposed fix
-      POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
+      POD_JSON=$(kubectl get pods -A -o json | jq '[.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))] | first')
       if [ -z "$POD_JSON" ]; then
         echo "No must-gather pod found"
         exit 1
       fi
-      NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
-      POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
+      NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace')
+      POD=$(echo "$POD_JSON" | jq -r '.metadata.name')

Note: This same pattern appears in all other task files in this PR.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
POD_JSON=$(kubectl get pods -A -o json | jq '[.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))] | first')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace')
POD=$(echo "$POD_JSON" | jq -r '.metadata.name')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@evals/tasks/openshift/plan-mustgather-node-selector/plan-mustgather-node-selector.yaml`
around lines 20 - 26, The current extraction stores possibly multiple pod JSON
objects into POD_JSON which concatenates objects and later pipelines into jq for
NS and POD causing parse errors; change the jq pipeline that sets POD_JSON to
produce a single JSON array (use jq -s or wrap items[] selection into an array)
or restrict the selection to the first matching item so subsequent jq calls that
read POD_JSON (used to set NS and POD) receive valid JSON; update the POD_JSON
assignment and/or the NS and POD extraction logic that references POD_JSON to
use jq -s or array indexing to safely handle multiple matches.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 23, 2026

@swghosh: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@swghosh swghosh marked this pull request as draft April 24, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants