Add OADP toolset for managing Velero backups and restores#122
Add OADP toolset for managing Velero backups and restores#122shubham-pampattiwar wants to merge 5 commits intoopenshift:mainfrom
Conversation
|
/assign matzew |
|
woot! thanks @shubham-pampattiwar |
|
/assign @Cali0707 |
|
@shubham-pampattiwar thanks for contributing a new toolset! A few thoughts:
Our experience with the core toolset and others so far is that in general we want to try to minimize the number of tools in the server, so that we don't overwhelm the model's context. E.g. cursor will return a warning to users when somewhere in the realm of 20-60 tools are loaded (and this is providing 90!) IMO we should be working to drastically reduce the number of tools provided by this toolset. A few things that can likely help:
For example, in the kubevirt toolset they made the action to be done to a VM a parameter of the tool, rather than having one tool for each action: openshift-mcp-server/pkg/toolsets/kubevirt/vm/lifecycle/tool.go Lines 40 to 44 in f13d475 Happy to discuss further on how to reduce the count of the tools while still ensuring agents can take all the actions they need to! |
Hey @Cali0707, thanks for the feedback! Makes sense - 90 tools is definitely too many. Here's my plan to reduce to ~9 tools using the action parameter pattern like kubevirt: Less common CRDs (DeleteBackupRequest, DownloadRequest, ServerStatusRequest, PodVolumeBackup/Restore) can use generic resources_* tools. Let me know if this approach works! |
|
Reducing the number of tools sounds like a good idea, as well leveraging core tools (e.g. A good idea is to also look at providing evals, like other toolsets (kubevirt for instance) We use mcpchecker, which now has also quickstarts for get going! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shubham-pampattiwar The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Addressed review feedback - refactored from 90 individual tools to 8 consolidated tools using the action parameter pattern (like kubevirt). Changes:
Also added 8 eval tasks - all passing (8/8 tasks, 24/24 assertions). |
|
@shubham-pampattiwar the toolset looks much better now! I left a few comments around the code, but overall this is looking great |
e852e86 to
9be67bd
Compare
|
Addressed review comments: Removed unused files:
Code changes:
Rebased on main to resolve merge conflict with netedge toolset. All tests passing. |
9be67bd to
abb9103
Compare
|
@shubham-pampattiwar I'm curious - did you run the evals with just the core toolset enabled vs. this new toolset? Most of the tools seem to be CRUD tools, and on frontier models they can often succeed with the generic resource tools for many domains (but not all, which is why evals are so useful) |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new OADP toolset: dynamic-client helpers, server tools and a troubleshooting prompt for Velero/OADP resources; preflight discovery; many eval tasks and scripts; documentation, README registration, and comprehensive unit tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant MCPServer as "MCP Server (oadp tool)"
participant K8s as "Kubernetes API (dynamic client)"
participant Controller as "OADP/Velero Controller"
Client->>MCPServer: Request (e.g., create backup with params)
MCPServer->>K8s: discoverBackupDefaults (list DPA, list BSL)
K8s-->>MCPServer: DPA/BSL results
MCPServer->>K8s: Create Backup (BackupGVR) unstructured
K8s-->>MCPServer: Created Backup object
Controller->>K8s: Process Backup (async) -> update status
K8s-->>MCPServer: Backup status updated
MCPServer-->>Client: Return created object + preflight warnings/status
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
44473ae to
5631a1c
Compare
5631a1c to
e0bcc07
Compare
e0bcc07 to
0b98130
Compare
0b98130 to
f228b5c
Compare
|
@shubham-pampattiwar do the evals need any specific set up on the cluster they run on? Is there any install steps for OADP that we need to do? If so, can you add a make target for those? |
|
@Cali0707 yes, the evals need Velero + object storage on the cluster. I've added make targets for this: Quick setup (Kind cluster + OADP): make local-env-setup-oadpOr on an existing cluster: make oadp-installThis installs Velero with MinIO as the S3 backend in the Running the evals: # Start server (note: --read-only=false is needed since the downstream default hides mutating tools)
./kubernetes-mcp-server --toolsets core,config,oadp --port 8080 --read-only=false
# Run OADP evals
mcpchecker check evals/claude-code/eval.yaml --run "oadp" --verboseOther targets: I tested end-to-end on a Kind cluster with mcpchecker — 11/11 tasks passed, 33/33 assertions passed. |
|
PR has a merge conflict! Please address and fix |
|
In We should not ignore those potential errors? |
|
@shubham-pampattiwar when I run More detailed error is: |
Add build/oadp.mk with targets to install Velero + MinIO for running OADP evals on Kind clusters. Includes local-env-setup-oadp for one-command setup. Targets: oadp-install, oadp-uninstall, oadp-status, velero-cli, local-env-setup-oadp
Rate limiting config, Entra ID example, and Entra ID link were accidentally dropped from docs/configuration.md and README.md during rebase conflict resolution. Restored from main and regenerated toolset tables.
… MinIO images
- Handle (value, found, error) returns from unstructured.NestedString/NestedInt64
instead of discarding them in backup.go, restore.go, and oadp_troubleshoot.go
- Use fully qualified docker.io/minio/minio and docker.io/minio/mc image names
to fix ImageInspectError on clusters with CRI-O short-name enforcement
- Replace interface{} with any in backup.go
4206bd6 to
590b156
Compare
@matzew Good catch, fixed. GetBackupStatus and GetRestoreStatus now return errors instead of silently using zero values. The troubleshoot prompt handlers default gracefully since they're building diagnostic output. |
@Cali0707 Fixed the MinIO images now use fully qualified names (docker.io/minio/minio:latest, docker.io/minio/mc:latest) to avoid CRI-O short-name ambiguity. Should work on your cluster now. |
|
@shubham-pampattiwar do you have any red hat/ quay registry images for minio? @mvinkler has run into issues with docker images in our QE environment |
- Bump Velero to v1.16.2 and AWS plugin to v1.12.2 to match OADP 1.5 - Pin OADP CRDs to oadp-1.5 branch instead of oadp-dev - Switch MinIO images from docker.io to quay.io to avoid CRI-O short-name enforcement issues in QE environments
@Cali0707 Switched to |
|
@shubham-pampattiwar: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| cleanup: | ||
| file: cleanup.sh | ||
| prompt: | ||
| inline: Check the status of the OADP DataProtectionApplication (DPA) in the openshift-adp namespace and tell me if it's ready |
There was a problem hiding this comment.
would this just work with the core toolset that we have? like crud against your resources - using the core toolset ?
| cleanup: | ||
| file: cleanup.sh | ||
| prompt: | ||
| inline: Create an OADP backup schedule named 'nightly-backup' that runs every night at 3 AM (cron '0 3 * * *') and backs up the 'production' namespace with a 30-day (720h) TTL |
There was a problem hiding this comment.
Here too - would this just work w/ the tools from the core toolset ?
|
It would be interesting to check if the evals actually pass, using the core tools (E.g. we have CRUD operations for resources). So wondering if we really need to add the toolset - looks like, e.g. via |
Summary
Adds OADP (OpenShift API for Data Protection) toolset to the kubernetes-mcp-server, enabling AI agents to create, monitor, and manage data protection workflows with domain-specific intelligence.
Fixes https://redhat.atlassian.net/browse/OADP-7194
Tools (8 consolidated action-based tools)
Following the kubevirt pattern, each tool uses an
actionparameter to select the operation:oadp_backupoadp_restoreoadp_scheduleoadp_storage_locationoadp_dpaoadp_repositoryoadp_data_moveroadp_data_protection_testPrompt
oadp-troubleshootDomain Intelligence (beyond generic CRUD)
defaultVolumesToFsBackupwhen NodeAgent is enabledDeleteBackupRequestCRD (proper Velero cleanup) instead of direct CR deletion which orphans data in object storageincludedNamespacesdon't exist before backup creationExample Usage
Test plan
go test -v ./pkg/oadp/... ./pkg/toolsets/oadp/...)make build && make lint)TestDeleteBackupCreatesDeleteBackupRequestproves proper Velero deletion patternChanges in this PR
oadp-troubleshootprompt for guided diagnostics (same pattern as kubevirtvm-troubleshoot)Summary by CodeRabbit
New Features
Documentation
Tests