MG-216: add `openshift/mustgather` toolset by swghosh · Pull Request #159 · openshift/openshift-mcp-server

swghosh · 2026-02-25T15:58:05Z

Offline debugging analysis use cases require us to get data parsed from a must-gather archive rather than online cluster's kapiserver endpoint. Some tools are similar to the core toolset but there are more tools beyond resources_get especially journal logs, kubelet logs, etc.

Summary by CodeRabbit

New Features
- Added openshift/mustgather toolset enabling offline analysis of Kubernetes/OpenShift must-gather archives.
- Introduced tools to query ETCD data, cluster events, pod logs, node diagnostics, and monitoring metrics (Prometheus/AlertManager).
- Added resource registration capability to MCP servers for dynamic resource management at runtime.
Tests
- Added comprehensive test suites for must-gather toolset and resource registrar functionality.

openshift-ci · 2026-02-25T15:58:41Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Prashanth684 · 2026-02-26T20:53:29Z

+// FindContainerDir locates the must-gather container directory
+// (e.g., quay-io-okd-scos-content-sha256-...)
+func FindContainerDir(basePath string) (string, error) {
+	entries, err := os.ReadDir(basePath)


there are a few considerations regarding where the must-gather can be stored:

if the server is running locally in a dev setup, it can access the must-gather from the fs

if it is running on the target cluster, we should probably make sure it can be read from a pv

the last one is variable where the must-gather needs to be accessed remotely. This will involve further discussions on where the must-gather from customer cases are stored and how they can be accessed.

Cali0707 · 2026-02-26T20:55:00Z

/cc @bentito

Iirc you were also interested in the offline bits

openshift-ci-robot · 2026-02-27T10:14:11Z

@swghosh: This pull request references MG-216 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-merge-robot · 2026-03-04T17:44:00Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai · 2026-04-06T21:25:50Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request introduces the mustgather toolset for offline analysis of OpenShift must-gather archives. It adds infrastructure for dynamic MCP resource registration, archive loading and indexing capabilities, and implements 8+ tools covering ETCD, events, monitoring, pod logs, nodes, and generic resource queries.

Changes

Cohort / File(s)	Summary
Resource Registration Infrastructure `pkg/api/toolsets.go`, `pkg/mcp/resource.go`, `pkg/mcp/gosdk.go`, `pkg/mcp/mcp.go`	Added `ResourceRegistrar` and `ResourceTemplateRegistrar` interfaces to `api.ToolHandlerParams`, implemented registrars wrapping `mcp.Server`, adjusted `ServerCapabilities` initialization to omit explicit `Resources: nil` assignment.
Module and Import Registration `pkg/mcp/modules.go`, `internal/tools/update-readme/main.go`	Added blank imports for `pkg/toolsets/mustgather` to register the mustgather toolset at initialization.
Must-Gather Data Types `pkg/ocp/mustgather/types.go`	New file defining 30+ exported Go types for must-gather operations: log options, metadata, ETCD health, Prometheus/AlertManager monitoring payloads, node diagnostics, and resource filtering options.
Must-Gather Archive Operations `pkg/ocp/mustgather/loader.go`, `pkg/ocp/mustgather/index.go`, `pkg/ocp/mustgather/provider.go`	New files implementing archive loading (resolver, YAML parsing, namespace/cluster-scoped resource ingestion), in-memory resource indexing with label/field selector support, and provider interface exposing resource, log, and diagnostics queries.
Must-Gather Backend Utilities `pkg/ocp/mustgather/etcd.go`, `pkg/ocp/mustgather/monitoring.go`	New files adding archive-backed accessors for ETCD health/members/object counts and Prometheus/AlertManager status, configuration, rules, targets, and TSDB metrics.
Mustgather Toolset Implementation `pkg/toolsets/mustgather/toolset.go`, `pkg/toolsets/mustgather/use.go`, `pkg/toolsets/mustgather/resources.go`, `pkg/toolsets/mustgather/namespaces.go`, `pkg/toolsets/mustgather/pod_logs.go`, `pkg/toolsets/mustgather/events.go`, `pkg/toolsets/mustgather/nodes.go`, `pkg/toolsets/mustgather/etcd.go`, `pkg/toolsets/mustgather/monitoring.go`, `pkg/toolsets/mustgather/helpers.go`	New toolset implementing 23 server tools: `mustgather_use` (archive loader/registrar), resource queries, namespace listing, pod log retrieval with filtering/tailing/grep, event queries by resource/time, node diagnostics, kubelet logs, ETCD queries, Prometheus/AlertManager metrics and status. Includes provider holder, typed extractors, and formatting utilities.
Test Support `internal/test/mcp.go`, `pkg/mcp/resource_test.go`, `pkg/mcp/toolsets_test.go`	Added `McpClient` resource listing/reading/template methods; introduced `ResourceRegistrarSuite` to verify resource and resource-template registrar behavior; added `TestOpenShiftMustGatherToolset` to validate tool metadata against JSON snapshot.
Test Data `pkg/mcp/testdata/toolsets-openshift-mustgather-tools.json`	New 622-line JSON file defining 23 mustgather tool schemas with required/optional parameters, enums, and descriptions.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Client
    participant MCP Server
    participant Toolset
    participant Provider
    participant Archive

    User->>Client: Call mustgather_use(path)
    Client->>MCP Server: ToolCall mustgather_use
    MCP Server->>Toolset: Execute handler with ResourceRegistrar
    Toolset->>Archive: Load archive from path
    Archive-->>Provider: Create Provider with loaded resources
    Toolset->>Provider: GetMetadata()
    Provider-->>Toolset: metadata (version, timestamp, counts)
    Toolset->>MCP Server: ResourceRegistrar.AddResource(must-gather://...)
    MCP Server->>MCP Server: Store resource with handler
    Toolset-->>Client: Success response with archive info

    User->>Client: Call mustgather_pod_logs_get(namespace, pod)
    Client->>MCP Server: ToolCall mustgather_pod_logs_get
    MCP Server->>Toolset: Execute handler
    Toolset->>Provider: GetPodLog(namespace, pod)
    Provider->>Archive: Read log file from pod directory
    Archive-->>Provider: Log content
    Provider-->>Toolset: Log content string
    Toolset-->>Client: Formatted log output

    User->>Client: ReadResource(must-gather://...)
    Client->>MCP Server: ReadResource request
    MCP Server->>MCP Server: Invoke resource handler
    MCP Server-->>Client: Archive metadata content

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 34.72% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly identifies the main change: adding an `openshift/mustgather` toolset. The title is concise, directly references the Jira issue (MG-216), and accurately summarizes the primary objective.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci-robot · 2026-04-06T21:27:27Z

@swghosh: This pull request references MG-216 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (7)

pkg/toolsets/mustgather/etcd.go (1)

173-175: Add deterministic tie-break for equal counts.

Equal counts currently follow map iteration order. Adding a secondary sort key (resource) makes output stable across runs.

♻️ Proposed refinement

 	sort.Slice(entries, func(i, j int) bool {
-		return entries[i].count > entries[j].count
+		if entries[i].count == entries[j].count {
+			return entries[i].resource < entries[j].resource
+		}
+		return entries[i].count > entries[j].count
 	})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/toolsets/mustgather/etcd.go` around lines 173 - 175, Current sort of
entries uses only count so ties are non-deterministic; update the comparator
passed to sort.Slice (the anonymous func used with entries) to break ties by the
resource field: return entries[i].count > entries[j].count || (entries[i].count
== entries[j].count && entries[i].resource < entries[j].resource). This ensures
deterministic, stable ordering for equal counts while keeping primary sort by
count.

pkg/toolsets/mustgather/use.go (1)

56-73: Consider removing previously registered must-gather resource before adding a new one.

When mustgather_use is called multiple times with different paths, the old must-gather://<old-path> resource remains registered alongside the new one. This could lead to stale resources accumulating in the MCP server.

♻️ Proposed fix to clean up old resource

 	// Register MCP resource
 	if params.ResourceRegistrar != nil {
 		resourceURI := "must-gather://" + path
+		// Remove any previously registered must-gather resource
+		// (simple approach: remove the exact URI being registered)
+		params.ResourceRegistrar.RemoveResources(resourceURI)
 		content := fmt.Sprintf("Must-Gather Archive\n"+

Alternatively, track the previously registered URI in package state and remove it before registering the new one.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/toolsets/mustgather/use.go` around lines 56 - 73, The package currently
registers a new must-gather resource via params.ResourceRegistrar.AddResource
without removing any previously registered must-gather URI, causing stale
entries; add logic in mustgather_use to remove the prior registered resource
before adding the new one by tracking the last registered URI in package state
(e.g., a package-level variable like lastMustGatherURI) and call
params.ResourceRegistrar.RemoveResource(lastMustGatherURI) if non-empty, then
update lastMustGatherURI to the new resourceURI after successful AddResource; if
concurrent calls are possible, protect the package-level tracker with a mutex to
avoid races.

pkg/toolsets/mustgather/events.go (1)

110-115: String comparison for timestamps works but is fragile.

The sorting relies on string comparison of lastTimestamp (e.g., "2024-01-15T10:30:00Z" > "2024-01-14T10:30:00Z"). This works for ISO 8601 / RFC 3339 timestamps, which Kubernetes uses, but adding a comment would clarify this assumption.

📝 Suggested documentation

 	// Sort by lastTimestamp descending
+	// String comparison works correctly for RFC 3339 timestamps (lexicographic order = chronological order)
 	sort.Slice(filtered, func(i, j int) bool {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/toolsets/mustgather/events.go` around lines 110 - 115, The sort block
that uses sort.Slice(filtered, ...) and compares ti and tj from
unstructured.NestedString(filtered[i].Object, "lastTimestamp") relies on
RFC3339/ISO‑8601 lexical ordering; add a concise comment above this sort
explaining that Kubernetes timestamps are RFC3339 and therefore string
comparison yields correct chronological ordering, and optionally note that if
non‑RFC3339 input is expected the code should parse to time.Time
(time.Parse(time.RFC3339)) before comparing for robustness.

pkg/toolsets/mustgather/helpers.go (1)

pkg/toolsets/mustgather/monitoring.go (2)

395-398: Fragile string manipulation for inserting summary.

Using strings.Replace to insert the summary after the first "\n\n" is brittle and could break if the header format changes. Consider building the output with the summary in the correct position from the start.

♻️ Suggested approach

-	output := "## Prometheus Rules\n\n"
-	if typeFilter != "" {
-		output += fmt.Sprintf("Filter: %s rules only\n\n", typeFilter)
-	}
+	var headerParts []string
+	headerParts = append(headerParts, "## Prometheus Rules")
 
 	// ... counting logic ...
 
-	summary := fmt.Sprintf("Total: %d groups, %d rules\n\n", totalGroups, totalRules)
-	// Insert summary after header
-	output = strings.Replace(output, "\n\n", "\n\n"+summary, 1)
+	output := "## Prometheus Rules\n\n"
+	output += fmt.Sprintf("Total: %d groups, %d rules\n\n", totalGroups, totalRules)
+	if typeFilter != "" {
+		output += fmt.Sprintf("Filter: %s rules only\n\n", typeFilter)
+	}
+	// ... rest of output building

This requires restructuring to count first, then build output.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/toolsets/mustgather/monitoring.go` around lines 395 - 398, The current
insertion of summary using strings.Replace on output is brittle; instead compute
totalGroups/totalRules first and construct the final output explicitly rather
than mutating with Replace: build the header, then append the summary (using the
summary variable you already format), then append the remainder of the formatted
rules/body into output so you no longer rely on finding "\n\n" in output (update
the code paths that set summary and output around the variables summary,
totalGroups, totalRules and remove strings.Replace usage).

494-526: Consider consolidating duplicate formatting functions.

formatNameValueSection and formatNameValueBytesSection have nearly identical logic, differing only in the value formatter (formatNumber vs formatBytes). Consider a single function with a formatter parameter.

♻️ Proposed consolidation

+// formatNameValueSectionWith formats a list of name-value pairs using the provided value formatter
+func formatNameValueSectionWith(title string, items []mg.NameValue, limit int, formatValue func(int64) string) string {
+	output := fmt.Sprintf("#### %s\n\n", title)
+	if len(items) == 0 {
+		return output + "No data available.\n\n"
+	}
+	count := len(items)
+	if limit > 0 && count > limit {
+		count = limit
+	}
+	for i := 0; i < count; i++ {
+		output += fmt.Sprintf("%-60s %s\n", truncate(items[i].Name, 60), formatValue(items[i].Value))
+	}
+	output += "\n"
+	return output
+}
+
 // formatNameValueSection formats a list of name-value pairs as a section
 func formatNameValueSection(title string, items []mg.NameValue, limit int) string {
-	// ... current implementation ...
+	return formatNameValueSectionWith(title, items, limit, formatNumber)
 }
 
 // formatNameValueBytesSection formats a list of name-value pairs with byte formatting
 func formatNameValueBytesSection(title string, items []mg.NameValue, limit int) string {
-	// ... current implementation ...
+	return formatNameValueSectionWith(title, items, limit, formatBytes)
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/toolsets/mustgather/monitoring.go` around lines 494 - 526, The two nearly
identical functions formatNameValueSection and formatNameValueBytesSection
should be consolidated into a single function that accepts a value formatter
parameter; implement a new function (e.g.,
formatNameValueSectionWithFormatter(title string, items []mg.NameValue, limit
int, formatValue func(int64) string)) that performs the shared
header/empty-check/limit/truncate loop and calls formatValue(items[i].Value) for
formatting, then replace calls to formatNameValueSection and
formatNameValueBytesSection to use this new function with formatNumber and
formatBytes respectively and remove the duplicated functions.

pkg/ocp/mustgather/loader.go (1)

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/mcp/mcp.go`:
- Around line 87-90: The Initialize response doesn't advertise the Resources
capability so clients can't discover resources added later via AddResource();
update the initialization capability block (where Prompts, Tools, Logging are
set) to include Resources by adding a Resources field using
mcp.ResourceCapabilities (e.g., set ListChanged based on configuration.Stateless
similar to Prompts/Tools) so the server pre-advertises resource support;
reference the Initialize handling logic that relies on s.resources.len() and the
AddResource() method to ensure clients will know Resources can be registered
dynamically.

In `@pkg/ocp/mustgather/etcd.go`:
- Around line 78-83: ReadETCDFile currently joins filename into
p.metadata.ContainerDir + "etcd_info" directly which allows path traversal;
validate and sanitize filename before reading: reject absolute paths
(filepath.IsAbs), reject any filename that after filepath.Clean contains ".."
path segments or path separators, or alternatively construct the absolute path
and verify it is inside the etcd_info base using filepath.Rel to ensure the
resulting filePath is within the base (p.metadata.ContainerDir + "etcd_info")
before calling os.ReadFile; update ReadETCDFile to perform these checks and
return an error for invalid traversal attempts.

In `@pkg/ocp/mustgather/index.go`:
- Around line 171-217: matchesLabelSelector and matchesFieldSelector currently
ignore malformed clauses and unsupported fields which widens matches; change
them to reject invalid selectors by returning false on any malformed part (e.g.,
kv len != 2 or empty key/value) and in matchesFieldSelector treat unknown keys
(not metadata.name/metadata.namespace) as invalid and return false instead of
continuing; update both functions (matchesLabelSelector and
matchesFieldSelector) to validate each clause strictly and fail-fast when a
clause is malformed or unsupported so callers receive no-match for bad
selectors.
- Around line 105-139: The List method in ResourceIndex builds result.Items and
truncates by opts.Limit without a deterministic ordering, causing
non-reproducible subsets; after populating result.Items (but before applying the
Limit) sort it deterministically (e.g. stable sort by GetNamespace(), then
GetName(), then GetUID() or creation timestamp) so repeated calls return the
same first N items; update ResourceIndex.List to perform this stable sort on
result.Items prior to the existing limit logic referenced by symbols
ResourceIndex.List, result.Items, matchesListOptions, and opts.Limit.

In `@pkg/ocp/mustgather/loader.go`:
- Around line 21-47: Load currently masks IO errors: in Load, do not treat all
FindContainerDir errors as ignorable—only tolerate the specific "container
directory not found" case; for any other error returned by FindContainerDir
return it immediately. Also stop swallowing errors from
loadClusterScopedResources and loadNamespacedResources: if either returns a
non-nil error, propagate that error from Load (do not append nil result or
continue). Update the logic in Load (referencing FindContainerDir, Load,
loadClusterScopedResources, loadNamespacedResources, and loadMetadata) so only
the explicit "container dir not found" condition is ignored and all other errors
are returned.

In `@pkg/ocp/mustgather/monitoring.go`:
- Around line 119-127: GetReplicaNumbers currently treats any non-exact match as
"all" which hides bad input; change the switch on replicaParam in
GetReplicaNumbers to only return []int{0} for "prometheus-k8s-0" or "0" and
[]int{1} for "prometheus-k8s-1" or "1", and make the default branch return nil
(or an empty slice) to signal invalid input instead of returning both replicas;
also update any callers of GetReplicaNumbers to detect len==0 (or nil) and
surface/return a clear validation error indicating the replicaParam was unknown.

In `@pkg/ocp/mustgather/provider.go`:
- Around line 176-201: The readGzipFile function currently swallows read errors
and always returns nil error; update read loop in readGzipFile to treat io.EOF
as normal (break) but return any other non-nil readErr to the caller (after
writing any bytes read), so decompression/truncation failures propagate; locate
the function readGzipFile and modify the loop to check readErr (if readErr ==
io.EOF break else return "", readErr).
- Around line 205-210: TailLines currently returns an empty string when content
ends with a newline because Split produces a trailing empty element; update
TailLines to trim trailing newline characters from content (e.g., use
strings.TrimRight(content, "\n")) before splitting or otherwise ignore the final
empty element so that TailLines("a\nb\n", 1) returns "b"; apply the change
inside the TailLines function to preserve existing behavior for
non-newline-terminated input.

In `@pkg/toolsets/mustgather/etcd.go`:
- Around line 155-197: The code accepts negative limit values from getInt which
can lead to misleading output (e.g., "top -1"); clamp and normalize limit
immediately after it's read: capture the raw value from getInt into a variable
(limit), then if limit < 0 set limit = 0 (treat negative as "show all") or
alternatively set limit = len(counts) to avoid negative slicing; use the
normalized limit when truncating entries and when composing the final summary
message (the "... showing top %d of %d" line) so it never prints a negative
number. Ensure you reference and update uses of limit in the slicing block, the
limit > 0 checks, and the final output message.

In `@pkg/toolsets/mustgather/monitoring.go`:
- Around line 192-197: The code currently calls mg.GetReplicaNumbers(replica)
then always uses only replicas[0], so passing "all" is ignored; update the logic
that uses replicas (the replicas variable and r usage) to handle multiple
entries: loop over the slice returned by mg.GetReplicaNumbers(replica) and
execute the monitoring/target logic for each replica, aggregating or merging
results appropriately (or, if intended, validate and reject "all" earlier and
adjust the tool description), ensuring references to params.GetArguments and
getString remain unchanged and any downstream variables that expected a single
replica are updated to accept per-replica processing.

In `@pkg/toolsets/mustgather/resources.go`:
- Around line 66-70: The current mustgatherResourcesGet uses the package-global
provider via getProvider(), which allows mustgather_use to change a global
`provider` and leak archives across sessions; instead, make the active
must-gather scoped to the session by retrieving the provider from the
request/session context in api.ToolHandlerParams (or adding a session-scoped
provider field) and update mustgatherResourcesGet to use that session-scoped
provider; also update mustgather_use to write the selected archive into the
session-scoped storage rather than the package-global `provider`, and propagate
this change to all mustgather_resources_* handlers so they read the provider
from the session (not getProvider/global `provider`).

---

Nitpick comments:
In `@pkg/ocp/mustgather/loader.go`:
- Around line 132-144: The walker callback in the filepath.Walk call is
currently returning on directories before checking for the "pods/..." prefix,
causing the walker to descend into all pod log subtrees; update the anonymous
callback used in filepath.Walk (the func(path string, info os.FileInfo, err
error) error) to compute rel := filepath.Rel(nsDir, path) early and, if
info.IsDir() && strings.HasPrefix(rel, "pods"+string(filepath.Separator)),
return filepath.SkipDir to prevent descending into pod log directories; keep the
existing file-suffix checks for .yaml/.yml for regular files.

In `@pkg/toolsets/mustgather/etcd.go`:
- Around line 173-175: Current sort of entries uses only count so ties are
non-deterministic; update the comparator passed to sort.Slice (the anonymous
func used with entries) to break ties by the resource field: return
entries[i].count > entries[j].count || (entries[i].count == entries[j].count &&
entries[i].resource < entries[j].resource). This ensures deterministic, stable
ordering for equal counts while keeping primary sort by count.

In `@pkg/toolsets/mustgather/events.go`:
- Around line 110-115: The sort block that uses sort.Slice(filtered, ...) and
compares ti and tj from unstructured.NestedString(filtered[i].Object,
"lastTimestamp") relies on RFC3339/ISO‑8601 lexical ordering; add a concise
comment above this sort explaining that Kubernetes timestamps are RFC3339 and
therefore string comparison yields correct chronological ordering, and
optionally note that if non‑RFC3339 input is expected the code should parse to
time.Time (time.Parse(time.RFC3339)) before comparing for robustness.

In `@pkg/toolsets/mustgather/helpers.go`:
- Around line 125-135: The healthSymbol function maps "firing" to "[OK]", which
is misleading; update the switch in healthSymbol so the "firing" string is
removed from the OK case and instead handled as a warning (map "firing" to
"[WARN]"), and update any related unit tests or callers that expect the old
behavior; locate the healthSymbol function and change the case membership
accordingly so "firing" falls through to the WARN branch (or add an explicit
case returning "[WARN]").

In `@pkg/toolsets/mustgather/monitoring.go`:
- Around line 395-398: The current insertion of summary using strings.Replace on
output is brittle; instead compute totalGroups/totalRules first and construct
the final output explicitly rather than mutating with Replace: build the header,
then append the summary (using the summary variable you already format), then
append the remainder of the formatted rules/body into output so you no longer
rely on finding "\n\n" in output (update the code paths that set summary and
output around the variables summary, totalGroups, totalRules and remove
strings.Replace usage).
- Around line 494-526: The two nearly identical functions formatNameValueSection
and formatNameValueBytesSection should be consolidated into a single function
that accepts a value formatter parameter; implement a new function (e.g.,
formatNameValueSectionWithFormatter(title string, items []mg.NameValue, limit
int, formatValue func(int64) string)) that performs the shared
header/empty-check/limit/truncate loop and calls formatValue(items[i].Value) for
formatting, then replace calls to formatNameValueSection and
formatNameValueBytesSection to use this new function with formatNumber and
formatBytes respectively and remove the duplicated functions.

In `@pkg/toolsets/mustgather/use.go`:
- Around line 56-73: The package currently registers a new must-gather resource
via params.ResourceRegistrar.AddResource without removing any previously
registered must-gather URI, causing stale entries; add logic in mustgather_use
to remove the prior registered resource before adding the new one by tracking
the last registered URI in package state (e.g., a package-level variable like
lastMustGatherURI) and call
params.ResourceRegistrar.RemoveResource(lastMustGatherURI) if non-empty, then
update lastMustGatherURI to the new resourceURI after successful AddResource; if
concurrent calls are possible, protect the package-level tracker with a mutex to
avoid races.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a666ab20-563f-4424-9df4-9b31e2b85c53

📥 Commits

Reviewing files that changed from the base of the PR and between 46c45a9 and c3cd32c.

📒 Files selected for processing (24)

go.mod
internal/tools/update-readme/main.go
pkg/api/toolsets.go
pkg/mcp/gosdk.go
pkg/mcp/mcp.go
pkg/mcp/modules.go
pkg/mcp/testdata/toolsets-openshift-mustgather-tools.json
pkg/mcp/toolsets_test.go
pkg/ocp/mustgather/etcd.go
pkg/ocp/mustgather/index.go
pkg/ocp/mustgather/loader.go
pkg/ocp/mustgather/monitoring.go
pkg/ocp/mustgather/provider.go
pkg/ocp/mustgather/types.go
pkg/toolsets/mustgather/etcd.go
pkg/toolsets/mustgather/events.go
pkg/toolsets/mustgather/helpers.go
pkg/toolsets/mustgather/monitoring.go
pkg/toolsets/mustgather/namespaces.go
pkg/toolsets/mustgather/nodes.go
pkg/toolsets/mustgather/pod_logs.go
pkg/toolsets/mustgather/resources.go
pkg/toolsets/mustgather/toolset.go
pkg/toolsets/mustgather/use.go

coderabbitai · 2026-04-06T21:39:15Z

+func (p *Provider) ReadETCDFile(filename string) ([]byte, error) {
+	filePath := filepath.Join(p.metadata.ContainerDir, "etcd_info", filename)
+	data, err := os.ReadFile(filePath)
+	if err != nil {
+		return nil, fmt.Errorf("ETCD file %s not found: %w", filename, err)
+	}


⚠️ Potential issue | 🟠 Major

Harden ReadETCDFile against path traversal.

At Line 79, filename is joined directly into the archive path. If a non-constant/untrusted filename reaches this method later, ../ traversal could read files outside etcd_info.

🔒 Proposed fix

import ( "encoding/json" "fmt" "os" "path/filepath" + "strings" ) @@ func (p *Provider) ReadETCDFile(filename string) ([]byte, error) { - filePath := filepath.Join(p.metadata.ContainerDir, "etcd_info", filename) + clean := filepath.Clean(filename) + if filepath.IsAbs(clean) || clean == ".." || strings.HasPrefix(clean, ".."+string(filepath.Separator)) { + return nil, fmt.Errorf("invalid ETCD filename: %q", filename) + } + + baseDir := filepath.Join(p.metadata.ContainerDir, "etcd_info") + filePath := filepath.Join(baseDir, clean) + rel, err := filepath.Rel(baseDir, filePath) + if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(filepath.Separator)) { + return nil, fmt.Errorf("invalid ETCD filename: %q", filename) + } + data, err := os.ReadFile(filePath) if err != nil { return nil, fmt.Errorf("ETCD file %s not found: %w", filename, err) } return data, nil }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func (p *Provider) ReadETCDFile(filename string) ([]byte, error) {

filePath := filepath.Join(p.metadata.ContainerDir, "etcd_info", filename)

data, err := os.ReadFile(filePath)

if err != nil {

return nil, fmt.Errorf("ETCD file %s not found: %w", filename, err)

}

func (p *Provider) ReadETCDFile(filename string) ([]byte, error) {

clean := filepath.Clean(filename)

if filepath.IsAbs(clean) || clean == ".." || strings.HasPrefix(clean, ".."+string(filepath.Separator)) {

return nil, fmt.Errorf("invalid ETCD filename: %q", filename)

}

baseDir := filepath.Join(p.metadata.ContainerDir, "etcd_info")

filePath := filepath.Join(baseDir, clean)

rel, err := filepath.Rel(baseDir, filePath)

if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(filepath.Separator)) {

return nil, fmt.Errorf("invalid ETCD filename: %q", filename)

}

data, err := os.ReadFile(filePath)

if err != nil {

return nil, fmt.Errorf("ETCD file %s not found: %w", filename, err)

}

return data, nil

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/ocp/mustgather/etcd.go` around lines 78 - 83, ReadETCDFile currently joins filename into p.metadata.ContainerDir + "etcd_info" directly which allows path traversal; validate and sanitize filename before reading: reject absolute paths (filepath.IsAbs), reject any filename that after filepath.Clean contains ".." path segments or path separators, or alternatively construct the absolute path and verify it is inside the etcd_info base using filepath.Rel to ensure the resulting filePath is within the base (p.metadata.ContainerDir + "etcd_info") before calling os.ReadFile; update ReadETCDFile to perform these checks and return an error for invalid traversal attempts.

coderabbitai · 2026-04-06T21:39:15Z

+func (idx *ResourceIndex) List(_ context.Context, gvk schema.GroupVersionKind, namespace string, opts ListOptions) *unstructured.UnstructuredList {
+	result := &unstructured.UnstructuredList{}
+
+	if namespace != "" {
+		// Namespace-scoped query
+		nsMap, ok := idx.byNamespace[namespace]
+		if !ok {
+			return result
+		}
+		gvkMap, ok := nsMap[gvk]
+		if !ok {
+			return result
+		}
+		for _, obj := range gvkMap {
+			if matchesListOptions(obj, opts) {
+				result.Items = append(result.Items, *obj.DeepCopy())
+			}
+		}
+	} else {
+		// Cluster-wide query
+		gvkMap, ok := idx.byGVK[gvk]
+		if !ok {
+			return result
+		}
+		for _, obj := range gvkMap {
+			if matchesListOptions(obj, opts) {
+				result.Items = append(result.Items, *obj.DeepCopy())
+			}
+		}
+	}
+
+	// Apply limit
+	if opts.Limit > 0 && len(result.Items) > opts.Limit {
+		result.Items = result.Items[:opts.Limit]
+	}


⚠️ Potential issue | 🟠 Major

Sort before applying limit.

mustgather_resources_list uses this path for user-visible results, but there is no stable ordering before opts.Limit is applied. Two identical calls can therefore return different subsets, which makes archive analysis hard to reproduce when users work with a small limit.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/ocp/mustgather/index.go` around lines 105 - 139, The List method in ResourceIndex builds result.Items and truncates by opts.Limit without a deterministic ordering, causing non-reproducible subsets; after populating result.Items (but before applying the Limit) sort it deterministically (e.g. stable sort by GetNamespace(), then GetName(), then GetUID() or creation timestamp) so repeated calls return the same first N items; update ResourceIndex.List to perform this stable sort on result.Items prior to the existing limit logic referenced by symbols ResourceIndex.List, result.Items, matchesListOptions, and opts.Limit.

coderabbitai · 2026-04-06T21:39:15Z

+func matchesLabelSelector(labels map[string]string, selector string) bool {
+	parts := strings.Split(selector, ",")
+	for _, part := range parts {
+		part = strings.TrimSpace(part)
+		if part == "" {
+			continue
+		}
+		kv := strings.SplitN(part, "=", 2)
+		if len(kv) != 2 {
+			continue
+		}
+		key := strings.TrimSpace(kv[0])
+		value := strings.TrimSpace(kv[1])
+		if labels[key] != value {
+			return false
+		}
+	}
+	return true
+}
+
+func matchesFieldSelector(obj *unstructured.Unstructured, selector string) bool {
+	parts := strings.Split(selector, ",")
+	for _, part := range parts {
+		part = strings.TrimSpace(part)
+		if part == "" {
+			continue
+		}
+		kv := strings.SplitN(part, "=", 2)
+		if len(kv) != 2 {
+			continue
+		}
+		key := strings.TrimSpace(kv[0])
+		value := strings.TrimSpace(kv[1])
+
+		switch key {
+		case "metadata.name":
+			if obj.GetName() != value {
+				return false
+			}
+		case "metadata.namespace":
+			if obj.GetNamespace() != value {
+				return false
+			}
+		}
+	}
+	return true
+}


⚠️ Potential issue | 🟠 Major

Reject malformed selectors instead of matching everything.

pkg/toolsets/mustgather/resources.go (Lines 106-118) forwards raw selector strings here. Right now malformed label clauses and unsupported field names are skipped with continue, so typos like app or spec.nodeName=... can silently broaden the result set instead of failing fast.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/ocp/mustgather/index.go` around lines 171 - 217, matchesLabelSelector and matchesFieldSelector currently ignore malformed clauses and unsupported fields which widens matches; change them to reject invalid selectors by returning false on any malformed part (e.g., kv len != 2 or empty key/value) and in matchesFieldSelector treat unknown keys (not metadata.name/metadata.namespace) as invalid and return false instead of continuing; update both functions (matchesLabelSelector and matchesFieldSelector) to validate each clause strictly and fail-fast when a clause is malformed or unsupported so callers receive no-match for bad selectors.

coderabbitai · 2026-04-06T21:39:15Z

+func Load(path string) (*LoadResult, error) {
+	containerDir, err := FindContainerDir(path)
+	if err != nil {
+		// If no container dir found, use the path directly
+		containerDir = path
+	}
+
+	metadata := MustGatherMetadata{
+		Path:         path,
+		ContainerDir: containerDir,
+	}
+
+	// Load metadata files
+	loadMetadata(containerDir, &metadata)
+
+	// Load all resources
+	var resources []*unstructured.Unstructured
+
+	clusterResources, err := loadClusterScopedResources(containerDir)
+	if err == nil {
+		resources = append(resources, clusterResources...)
+	}
+
+	namespacedResources, err := loadNamespacedResources(containerDir)
+	if err == nil {
+		resources = append(resources, namespacedResources...)
+	}


⚠️ Potential issue | 🟠 Major

Don't treat unreadable archives as a successful load.

Load falls back to containerDir = path for any FindContainerDir error, and then drops the resource-loader errors on Lines 39-47. A nonexistent path, file path, or permission failure can therefore "load" with zero resources and only fail later as false not found results. Please only ignore the "container directory not found" case and return other I/O errors immediately.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/ocp/mustgather/loader.go` around lines 21 - 47, Load currently masks IO errors: in Load, do not treat all FindContainerDir errors as ignorable—only tolerate the specific "container directory not found" case; for any other error returned by FindContainerDir return it immediately. Also stop swallowing errors from loadClusterScopedResources and loadNamespacedResources: if either returns a non-nil error, propagate that error from Load (do not append nil result or continue). Update the logic in Load (referencing FindContainerDir, Load, loadClusterScopedResources, loadNamespacedResources, and loadMetadata) so only the explicit "container dir not found" condition is ignored and all other errors are returned.

coderabbitai · 2026-04-06T21:39:15Z

+func readGzipFile(path string) (string, error) {
+	f, err := os.Open(path)
+	if err != nil {
+		return "", err
+	}
+	defer func() { _ = f.Close() }()
+
+	reader, err := gzip.NewReader(f)
+	if err != nil {
+		return "", err
+	}
+	defer func() { _ = reader.Close() }()
+
+	var sb strings.Builder
+	buf := make([]byte, 32*1024)
+	for {
+		n, readErr := reader.Read(buf)
+		if n > 0 {
+			sb.Write(buf[:n])
+		}
+		if readErr != nil {
+			break
+		}
+	}
+
+	return sb.String(), nil


⚠️ Potential issue | 🟠 Major

Return decompression failures to the caller.

Any non-nil readErr just exits the loop, and the function then returns sb.String(), nil. A truncated or corrupt kubelet log will therefore look like a successful read with missing tail data, which is very hard to spot during troubleshooting.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/ocp/mustgather/provider.go` around lines 176 - 201, The readGzipFile function currently swallows read errors and always returns nil error; update read loop in readGzipFile to treat io.EOF as normal (break) but return any other non-nil readErr to the caller (after writing any bytes read), so decompression/truncation failures propagate; locate the function readGzipFile and modify the loop to check readErr (if readErr == io.EOF break else return "", readErr).

coderabbitai · 2026-04-06T21:39:15Z

+func TailLines(content string, n int) string {
+	lines := strings.Split(content, "\n")
+	if len(lines) <= n {
+		return content
+	}
+	return strings.Join(lines[len(lines)-n:], "\n")


⚠️ Potential issue | 🟠 Major

Trim trailing newlines before tailing.

Newline-terminated logs currently grow an empty final element, so tail=1 on a\nb\n returns the blank trailer instead of b. That affects both pod log and kubelet log tailing.

Suggested fix

func TailLines(content string, n int) string { - lines := strings.Split(content, "\n") + hadTrailingNewline := strings.HasSuffix(content, "\n") + trimmed := strings.TrimRight(content, "\n") + lines := strings.Split(trimmed, "\n") if len(lines) <= n { return content } - return strings.Join(lines[len(lines)-n:], "\n") + tail := strings.Join(lines[len(lines)-n:], "\n") + if hadTrailingNewline { + tail += "\n" + } + return tail }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/ocp/mustgather/provider.go` around lines 205 - 210, TailLines currently returns an empty string when content ends with a newline because Split produces a trailing empty element; update TailLines to trim trailing newline characters from content (e.g., use strings.TrimRight(content, "\n")) before splitting or otherwise ignore the final empty element so that TailLines("a\nb\n", 1) returns "b"; apply the change inside the TailLines function to preserve existing behavior for non-newline-terminated input.

coderabbitai · 2026-04-06T21:39:15Z

+	limit := getInt(params.GetArguments(), "limit", 50)
+
+	counts, err := p.GetETCDObjectCount()
+	if err != nil {
+		return api.NewToolCallResult("", fmt.Errorf("failed to get ETCD object counts: %w", err)), nil
+	}
+
+	// Sort by count descending
+	type entry struct {
+		resource string
+		count    int64
+	}
+	entries := make([]entry, 0, len(counts))
+	var total int64
+	for k, v := range counts {
+		entries = append(entries, entry{resource: k, count: v})
+		total += v
+	}
+	sort.Slice(entries, func(i, j int) bool {
+		return entries[i].count > entries[j].count
+	})
+
+	if limit > 0 && len(entries) > limit {
+		entries = entries[:limit]
+	}
+
+	output := fmt.Sprintf("## ETCD Object Counts\n\nTotal objects: %s across %d resource types\n\n", formatNumber(total), len(counts))
+
+	// Find max resource name length for alignment
+	maxLen := 0
+	for _, e := range entries {
+		if len(e.resource) > maxLen {
+			maxLen = len(e.resource)
+		}
+	}
+
+	for _, e := range entries {
+		output += fmt.Sprintf("%-*s  %s\n", maxLen, e.resource, formatNumber(e.count))
+	}
+
+	if limit > 0 && len(counts) > limit {
+		output += fmt.Sprintf("\n... showing top %d of %d resource types\n", limit, len(counts))
+	}


⚠️ Potential issue | 🟡 Minor

Validate limit to avoid invalid output states.

At Line 155, non-positive values are accepted; e.g., negative input can produce misleading text at Line 196 (top -1).

🛠 Proposed fix

limit := getInt(params.GetArguments(), "limit", 50) + if limit <= 0 { + return api.NewToolCallResult("", fmt.Errorf("limit must be greater than 0")), nil + } @@ - if limit > 0 && len(entries) > limit { + if len(entries) > limit { entries = entries[:limit] } @@ - if limit > 0 && len(counts) > limit { + if len(counts) > limit { output += fmt.Sprintf("\n... showing top %d of %d resource types\n", limit, len(counts)) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

limit := getInt(params.GetArguments(), "limit", 50)

counts, err := p.GetETCDObjectCount()

if err != nil {

return api.NewToolCallResult("", fmt.Errorf("failed to get ETCD object counts: %w", err)), nil

}

// Sort by count descending

type entry struct {

resource string

count int64

}

entries := make([]entry, 0, len(counts))

var total int64

for k, v := range counts {

entries = append(entries, entry{resource: k, count: v})

total += v

}

sort.Slice(entries, func(i, j int) bool {

return entries[i].count > entries[j].count

})

if limit > 0 && len(entries) > limit {

entries = entries[:limit]

}

output := fmt.Sprintf("## ETCD Object Counts\n\nTotal objects: %s across %d resource types\n\n", formatNumber(total), len(counts))

// Find max resource name length for alignment

maxLen := 0

for _, e := range entries {

if len(e.resource) > maxLen {

maxLen = len(e.resource)

}

}

for _, e := range entries {

output += fmt.Sprintf("%-*s %s\n", maxLen, e.resource, formatNumber(e.count))

}

if limit > 0 && len(counts) > limit {

output += fmt.Sprintf("\n... showing top %d of %d resource types\n", limit, len(counts))

}

limit := getInt(params.GetArguments(), "limit", 50)

if limit <= 0 {

return api.NewToolCallResult("", fmt.Errorf("limit must be greater than 0")), nil

}

counts, err := p.GetETCDObjectCount()

if err != nil {

return api.NewToolCallResult("", fmt.Errorf("failed to get ETCD object counts: %w", err)), nil

}

// Sort by count descending

type entry struct {

resource string

count int64

}

entries := make([]entry, 0, len(counts))

var total int64

for k, v := range counts {

entries = append(entries, entry{resource: k, count: v})

total += v

}

sort.Slice(entries, func(i, j int) bool {

return entries[i].count > entries[j].count

})

if len(entries) > limit {

entries = entries[:limit]

}

output := fmt.Sprintf("## ETCD Object Counts\n\nTotal objects: %s across %d resource types\n\n", formatNumber(total), len(counts))

// Find max resource name length for alignment

maxLen := 0

for _, e := range entries {

if len(e.resource) > maxLen {

maxLen = len(e.resource)

}

}

for _, e := range entries {

output += fmt.Sprintf("%-*s %s\n", maxLen, e.resource, formatNumber(e.count))

}

if len(counts) > limit {

output += fmt.Sprintf("\n... showing top %d of %d resource types\n", limit, len(counts))

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/etcd.go` around lines 155 - 197, The code accepts negative limit values from getInt which can lead to misleading output (e.g., "top -1"); clamp and normalize limit immediately after it's read: capture the raw value from getInt into a variable (limit), then if limit < 0 set limit = 0 (treat negative as "show all") or alternatively set limit = len(counts) to avoid negative slicing; use the normalized limit when truncating entries and when composing the final summary message (the "... showing top %d of %d" line) so it never prints a negative number. Ensure you reference and update uses of limit in the slicing block, the limit > 0 checks, and the final output message.

coderabbitai · 2026-04-06T21:39:15Z

+	args := params.GetArguments()
+	replica := getString(args, "replica", "0")
+	healthFilter := getString(args, "health", "")
+
+	replicas := mg.GetReplicaNumbers(replica)
+	r := replicas[0]


⚠️ Potential issue | 🟡 Minor

Replica handling ignores "all" option for targets tool.

The tool description says replica can be "0, 1, or all" with default "0", but the implementation always uses only replicas[0] regardless of input. If "all" is passed, it will return [0, 1] but only the first element is used.

🐛 Proposed fix to honor "all" or clarify description

Option 1: Update description to clarify only one replica is used:

- "replica": {Type: "string", Description: "Prometheus replica (0, 1, or all). Default: 0"}, + "replica": {Type: "string", Description: "Prometheus replica (0 or 1). Default: 0"},

Option 2: Support multiple replicas (more complex):

replicas := mg.GetReplicaNumbers(replica) - r := replicas[0] - targets, err := p.GetPrometheusActiveTargets(r) + output := "" + for _, r := range replicas { + targets, err := p.GetPrometheusActiveTargets(r) + // ... handle each replica + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/monitoring.go` around lines 192 - 197, The code currently calls mg.GetReplicaNumbers(replica) then always uses only replicas[0], so passing "all" is ignored; update the logic that uses replicas (the replicas variable and r usage) to handle multiple entries: loop over the slice returned by mg.GetReplicaNumbers(replica) and execute the monitoring/target logic for each replica, aggregating or merging results appropriately (or, if intended, validate and reject "all" earlier and adjust the tool description), ensuring references to params.GetArguments and getString remain unchanged and any downstream variables that expected a single replica are updated to accept per-replica processing.

coderabbitai · 2026-04-06T21:39:15Z

+func mustgatherResourcesGet(params api.ToolHandlerParams) (*api.ToolCallResult, error) {
+	p, err := getProvider()
+	if err != nil {
+		return api.NewToolCallResult("", err), nil
+	}


⚠️ Potential issue | 🟠 Major

Scope the active must-gather per session, not process-wide.

The lookup here is backed by the package-global provider in pkg/toolsets/mustgather/helpers.go (Lines 11-31). Once another client calls mustgather_use, subsequent mustgather_resources_* calls can start reading that other archive, which breaks isolation and can leak customer data between concurrent sessions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/resources.go` around lines 66 - 70, The current mustgatherResourcesGet uses the package-global provider via getProvider(), which allows mustgather_use to change a global `provider` and leak archives across sessions; instead, make the active must-gather scoped to the session by retrieving the provider from the request/session context in api.ToolHandlerParams (or adding a session-scoped provider field) and update mustgatherResourcesGet to use that session-scoped provider; also update mustgather_use to write the selected archive into the session-scoped storage rather than the package-global `provider`, and propagate this change to all mustgather_resources_* handlers so they read the provider from the session (not getProvider/global `provider`).

openshift-ci-robot · 2026-04-06T21:44:35Z

@swghosh: This pull request references MG-216 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/toolsets/mustgather/events.go`:
- Around line 36-37: The limit parameter across mustgather event endpoints is
currently uncapped/treated as unlimited; update the validation and defaulting so
every "limit" schema and parser enforces a sane default and an upper cap (e.g.,
default=100, max=1000) and treats non-positive values as the default;
specifically modify the limit schema entries and the parsing logic used by
mustgather_events_by_resource and other event handlers in
pkg/toolsets/mustgather/events.go so they clamp incoming limit to the range [1,
MAX_LIMIT] (use a named constant MAX_LIMIT) and consistently apply the default
when absent or <=0.
- Around line 134-139: The sort/filter is currently comparing raw timestamp
strings from unstructured.NestedString for "lastTimestamp", which misorders
events and drops ones missing or with unparsable strings; update the sort and
any filter logic (the sort.Slice call operating on variable filtered and other
similar blocks) to extract "lastTimestamp", "eventTime", and "firstTimestamp"
via unstructured.NestedString, parse them with time.Parse(time.RFC3339) (or try
multiple Kubernetes timestamp layouts), fall back to zero time when parsing
fails, and sort by the resulting time.Time values (newest first); apply the same
parsing-and-fallback approach to the other occurrences noted (the blocks around
the other sort/filter usages at the referenced locations) so all comparisons use
parsed times rather than raw strings.
- Around line 233-240: The code parses untilStr into untilTime but doesn't
reject inverted ranges; add a validation after parsing that when hasUntil is
true and sinceStr is non-empty (or sinceTime was parsed) you compare untilTime
and sinceTime and return an api.NewToolCallResult error if untilTime is before
sinceTime (e.g., "invalid time range: 'until' must not be before 'since'"),
updating the same error-return path used in this file; reference the untilTime,
hasUntil, untilStr and sinceStr/sinceTime variables to locate where to add this
check.

In `@pkg/toolsets/mustgather/pod_logs.go`:
- Around line 141-151: The current code calls p.GetPodLog(...) and then does
strings.Split(logs, "\n"), which materializes the entire container log; change
the implementation to a streaming/reader-based path (e.g., add or use a
GetPodLogStream/ReadCloser variant on the provider that accepts mg.PodLogOptions
and returns an io.ReadCloser or stream) and replace the strings.Split usage with
a bufio.Scanner (or equivalent) to scan lines incrementally; while scanning,
apply the grep/filter logic and keep only the needed tail matches using a
fixed-size ring buffer or deque to avoid buffering the whole log, ensure you
close the stream, and propagate errors the same way as before (references:
GetPodLog, mg.PodLogOptions, and the function performing the current
split/grep).
- Around line 77-94: The code currently accepts any integer for tail (assigned
via tail := getInt(args, "tail", 0)) and forwards negative values to the
provider; add explicit validation after retrieving tail to reject negative
values: if tail < 0 return api.NewToolCallResult("", fmt.Errorf("tail must be
non-negative")), nil. Apply the same check in the other handler that reads tail
(the similar block around where tail is assigned for the grep/mustgather grep
path) so both mustgatherPodLogsGet/code paths consistently return a validation
error for negative tail values rather than forwarding or treating them as
unlimited.
- Around line 168-185: Compute and store the total number of matches before you
truncate matchingLines for tailing, e.g. capture totalMatches :=
len(matchingLines) immediately before the block that slices matchingLines when
tail > 0, then use totalMatches in the header’s "Found %d matching line(s)"
message while still slicing matchingLines for display; update the code around
the matchingLines/tail logic and the header construction (symbols:
matchingLines, tail, totalMatches, header, namespace, pod, filter, container,
previous, caseInsensitive) so the summary shows the full match count but the
output only contains the tailed subset.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cabd9a7d-7a1f-4b77-bd60-5d401b3e8d88

📥 Commits

Reviewing files that changed from the base of the PR and between c3cd32c and 99db57c.

📒 Files selected for processing (3)

pkg/mcp/testdata/toolsets-openshift-mustgather-tools.json
pkg/toolsets/mustgather/events.go
pkg/toolsets/mustgather/pod_logs.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/mcp/testdata/toolsets-openshift-mustgather-tools.json

coderabbitai · 2026-04-06T21:51:38Z

+						"limit":     {Type: "integer", Description: "Maximum number of events to return (default: 100)"},
+					},


⚠️ Potential issue | 🟠 Major

Normalize and cap result limits across all event tools.

limit is currently uncapped (and effectively unlimited for non-positive values), and mustgather_events_by_resource has no limit at all. Large archives can produce oversized responses.

Proposed fix

+const ( + defaultEventsListLimit = 100 + defaultEventsByTimeLimit = 200 + defaultEventsByResLimit = 200 + maxEventsLimit = 1000 +) + +func normalizeLimit(v, def int) int { + if v <= 0 { + return def + } + if v > maxEventsLimit { + return maxEventsLimit + } + return v +} + // schema: mustgather_events_by_resource Properties: map[string]*jsonschema.Schema{ "name": {Type: "string", Description: "Resource name"}, "namespace": {Type: "string", Description: "Resource namespace"}, "kind": {Type: "string", Description: "Resource kind (optional, narrows search)"}, + "limit": {Type: "integer", Description: "Maximum number of events to return (default: 200, max: 1000)"}, }, // mustgatherEventsList - limit := getInt(args, "limit", 100) + limit := normalizeLimit(getInt(args, "limit", defaultEventsListLimit), defaultEventsListLimit) // mustgatherEventsByResource + limit := normalizeLimit(getInt(args, "limit", defaultEventsByResLimit), defaultEventsByResLimit) + if len(matched) > limit { + matched = matched[:limit] + } // mustgatherEventsByTime - limit := getInt(args, "limit", 200) + limit := normalizeLimit(getInt(args, "limit", defaultEventsByTimeLimit), defaultEventsByTimeLimit)

Also applies to: 53-57, 79-80, 101-104, 142-144, 203-206, 222-223, 282-284

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/events.go` around lines 36 - 37, The limit parameter across mustgather event endpoints is currently uncapped/treated as unlimited; update the validation and defaulting so every "limit" schema and parser enforces a sane default and an upper cap (e.g., default=100, max=1000) and treats non-positive values as the default; specifically modify the limit schema entries and the parsing logic used by mustgather_events_by_resource and other event handlers in pkg/toolsets/mustgather/events.go so they clamp incoming limit to the range [1, MAX_LIMIT] (use a named constant MAX_LIMIT) and consistently apply the default when absent or <=0.

coderabbitai · 2026-04-06T21:51:38Z

+	// Sort by lastTimestamp descending
+	sort.Slice(filtered, func(i, j int) bool {
+		ti, _, _ := unstructured.NestedString(filtered[i].Object, "lastTimestamp")
+		tj, _, _ := unstructured.NestedString(filtered[j].Object, "lastTimestamp")
+		return ti > tj
+	})


⚠️ Potential issue | 🟠 Major

Sort/filter events using parsed timestamps, not raw strings.

Current logic compares timestamp strings and only uses lastTimestamp. That can misorder results and silently exclude valid events when lastTimestamp is absent or unparseable.

Proposed fix

+func parseEventTime(event *unstructured.Unstructured) (time.Time, bool) { + for _, path := range [][]string{ + {"lastTimestamp"}, + {"eventTime"}, + {"metadata", "creationTimestamp"}, + } { + ts, found, _ := unstructured.NestedString(event.Object, path...) + if !found || ts == "" { + continue + } + if t, err := time.Parse(time.RFC3339, ts); err == nil { + return t, true + } + } + return time.Time{}, false +} + // Example in mustgatherEventsByTime loop - lastTimestamp, _, _ := unstructured.NestedString(event.Object, "lastTimestamp") - if lastTimestamp == "" { - continue - } - eventTime, err := time.Parse(time.RFC3339, lastTimestamp) - if err != nil { + eventTime, ok := parseEventTime(event) + if !ok { continue } // Example sort (descending) - return ti > tj + ti, oki := parseEventTime(&filtered[i]) + tj, okj := parseEventTime(&filtered[j]) + if !oki { + return false + } + if !okj { + return true + } + return ti.After(tj)

Also applies to: 193-197, 248-256, 275-280

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/events.go` around lines 134 - 139, The sort/filter is currently comparing raw timestamp strings from unstructured.NestedString for "lastTimestamp", which misorders events and drops ones missing or with unparsable strings; update the sort and any filter logic (the sort.Slice call operating on variable filtered and other similar blocks) to extract "lastTimestamp", "eventTime", and "firstTimestamp" via unstructured.NestedString, parse them with time.Parse(time.RFC3339) (or try multiple Kubernetes timestamp layouts), fall back to zero time when parsing fails, and sort by the resulting time.Time values (newest first); apply the same parsing-and-fallback approach to the other occurrences noted (the blocks around the other sort/filter usages at the referenced locations) so all comparisons use parsed times rather than raw strings.

coderabbitai · 2026-04-06T21:51:39Z

+	var untilTime time.Time
+	hasUntil := untilStr != ""
+	if hasUntil {
+		untilTime, err = time.Parse(time.RFC3339, untilStr)
+		if err != nil {
+			return api.NewToolCallResult("", fmt.Errorf("invalid until time format, expected RFC3339 (e.g. 2026-01-15T12:00:00Z): %w", err)), nil
+		}
+	}


⚠️ Potential issue | 🟡 Minor

Reject inverted time ranges (until before since).

Without this check, invalid windows return misleading “No events found”.

Proposed fix

if hasUntil { untilTime, err = time.Parse(time.RFC3339, untilStr) if err != nil { return api.NewToolCallResult("", fmt.Errorf("invalid until time format, expected RFC3339 (e.g. 2026-01-15T12:00:00Z): %w", err)), nil } + if untilTime.Before(sinceTime) { + return api.NewToolCallResult("", fmt.Errorf("until must be greater than or equal to since")), nil + } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/events.go` around lines 233 - 240, The code parses untilStr into untilTime but doesn't reject inverted ranges; add a validation after parsing that when hasUntil is true and sinceStr is non-empty (or sinceTime was parsed) you compare untilTime and sinceTime and return an api.NewToolCallResult error if untilTime is before sinceTime (e.g., "invalid time range: 'until' must not be before 'since'"), updating the same error-return path used in this file; reference the untilTime, hasUntil, untilStr and sinceStr/sinceTime variables to locate where to add this check.

coderabbitai · 2026-04-06T21:51:39Z

+	tail := getInt(args, "tail", 0)
+
+	if namespace == "" || pod == "" {
+		return api.NewToolCallResult("", fmt.Errorf("namespace and pod are required")), nil
+	}
+
+	logType := mg.LogTypeCurrent
+	if previous {
+		logType = mg.LogTypePrevious
+	}
+
+	logs, err := p.GetPodLog(mg.PodLogOptions{
+		Namespace: namespace,
+		Pod:       pod,
+		Container: container,
+		LogType:   logType,
+		TailLines: tail,
+	})


⚠️ Potential issue | 🟡 Minor

Reject negative tail values explicitly.

Line 77 and Line 126 accept any integer, but the same invalid input currently behaves differently: mustgatherPodLogsGet forwards it to the provider, while grep silently treats it as unlimited. Returning a validation error here will make both tools deterministic.

🛠️ Suggested fix

tail := getInt(args, "tail", 0) + if tail < 0 { + return api.NewToolCallResult("", fmt.Errorf("tail must be >= 0")), nil + } if namespace == "" || pod == "" { return api.NewToolCallResult("", fmt.Errorf("namespace and pod are required")), nil }

tail := getInt(args, "tail", 0) caseInsensitive := getBool(args, "caseInsensitive", false) + if tail < 0 { + return api.NewToolCallResult("", fmt.Errorf("tail must be >= 0")), nil + } if namespace == "" || pod == "" { return api.NewToolCallResult("", fmt.Errorf("namespace and pod are required")), nil }

Also applies to: 126-133

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/pod_logs.go` around lines 77 - 94, The code currently accepts any integer for tail (assigned via tail := getInt(args, "tail", 0)) and forwards negative values to the provider; add explicit validation after retrieving tail to reject negative values: if tail < 0 return api.NewToolCallResult("", fmt.Errorf("tail must be non-negative")), nil. Apply the same check in the other handler that reads tail (the similar block around where tail is assigned for the grep/mustgather grep path) so both mustgatherPodLogsGet/code paths consistently return a validation error for negative tail values rather than forwarding or treating them as unlimited.

coderabbitai · 2026-04-06T21:51:39Z

+	logs, err := p.GetPodLog(mg.PodLogOptions{
+		Namespace: namespace,
+		Pod:       pod,
+		Container: container,
+		LogType:   logType,
+	})
+	if err != nil {
+		return api.NewToolCallResult("", fmt.Errorf("failed to get pod logs: %w", err)), nil
+	}
+
+	lines := strings.Split(logs, "\n")


⚠️ Potential issue | 🟠 Major

Avoid full-log materialization in the grep path.

Even when the caller only wants the last few matches, Line 141 loads the entire container log and Line 151 splits the whole string before filtering. On large must-gather archives this makes mustgather_pod_logs_grep scale with total log size in both memory and latency, which is risky for the offline-debugging use case this toolset is targeting. A streaming/provider-side grep path would be much safer here.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/pod_logs.go` around lines 141 - 151, The current code calls p.GetPodLog(...) and then does strings.Split(logs, "\n"), which materializes the entire container log; change the implementation to a streaming/reader-based path (e.g., add or use a GetPodLogStream/ReadCloser variant on the provider that accepts mg.PodLogOptions and returns an io.ReadCloser or stream) and replace the strings.Split usage with a bufio.Scanner (or equivalent) to scan lines incrementally; while scanning, apply the grep/filter logic and keep only the needed tail matches using a fixed-size ring buffer or deque to avoid buffering the whole log, ensure you close the stream, and propagate errors the same way as before (references: GetPodLog, mg.PodLogOptions, and the function performing the current split/grep).

coderabbitai · 2026-04-06T21:51:39Z

+	if tail > 0 && len(matchingLines) > tail {
+		matchingLines = matchingLines[len(matchingLines)-tail:]
+	}
+
+	header := fmt.Sprintf("Logs for pod %s/%s filtered by '%s'", namespace, pod, filter)
+	if container != "" {
+		header += fmt.Sprintf(", container %s", container)
+	}
+	if previous {
+		header += " (previous)"
+	}
+	if caseInsensitive {
+		header += " (case-insensitive)"
+	}
+	if tail > 0 {
+		header += fmt.Sprintf(" (last %d matches)", tail)
+	}
+	header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines))


⚠️ Potential issue | 🟡 Minor

Keep the total match count separate from the tailed result.

Line 185 reports len(matchingLines) after Lines 168-170 may already have dropped earlier hits. With tail=10, 100 matches will be reported as Found 10 matching line(s), which makes the summary inaccurate.

🛠️ Suggested fix

- var matchingLines []string + var matchingLines []string + totalMatches := 0 for _, line := range lines { compareLine := line if caseInsensitive { compareLine = strings.ToLower(line) } if strings.Contains(compareLine, searchFilter) { + totalMatches++ matchingLines = append(matchingLines, line) } } @@ - header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines)) + header += fmt.Sprintf(":\n\nFound %d matching line(s)", totalMatches) + if tail > 0 && totalMatches > len(matchingLines) { + header += fmt.Sprintf(", returning last %d", len(matchingLines)) + } + header += "\n\n"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/pod_logs.go` around lines 168 - 185, Compute and store the total number of matches before you truncate matchingLines for tailing, e.g. capture totalMatches := len(matchingLines) immediately before the block that slices matchingLines when tail > 0, then use totalMatches in the header’s "Found %d matching line(s)" message while still slicing matchingLines for display; update the code around the matchingLines/tail logic and the header construction (symbols: matchingLines, tail, totalMatches, header, namespace, pod, filter, container, previous, caseInsensitive) so the summary shows the full match count but the output only contains the tailed subset.

openshift-ci-robot · 2026-04-06T21:55:52Z

@swghosh: This pull request references MG-216 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pkg/toolsets/mustgather/pod_logs.go (1)

81-81: Clarify that limit returns the first N matching lines.

The current description "Maximum number of lines to return" doesn't indicate whether it's the first or last N. This differs from tail in the grep tool which returns the last N matches. Consider updating for clarity.

📝 Suggested description update

-						"limit":     {Type: "integer", Description: "Maximum number of lines to return (default: 500)"},
+						"limit":     {Type: "integer", Description: "Maximum number of lines to return from start of range (default: 500)"},

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/toolsets/mustgather/pod_logs.go` at line 81, Update the JSON schema
description for the "limit" parameter to explicitly state it returns the first N
matching lines (not the last), so callers understand it behaves differently from
grep's tail behavior; locate the "limit" schema entry in
pkg/toolsets/mustgather/pod_logs.go (the `"limit": {Type: "integer",
Description: ...}` entry) and change the Description to something like "Maximum
number of matching lines to return (first N matches, default: 500)" to clearly
indicate it returns the first matches.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/toolsets/mustgather/pod_logs.go`:
- Around line 273-291: The loop in PodLogs that currently silently drops lines
without parseable RFC3339Nano timestamps should count them and surface that
metric: add a skippedLines int counter and increment it whenever you continue
due to spaceIdx == -1 or time.Parse error (the variables in this block are
lines, spaceIdx, lineTime, sinceTime, hasUntil, untilTime). After processing,
include skippedLines in the log header (only when >0) so callers know how many
lines were omitted; update the header-building code that emits the pod log
metadata to append a "(skipped X non-timestamped lines)" note when skippedLines
> 0. Ensure behavior otherwise unchanged.

---

Nitpick comments:
In `@pkg/toolsets/mustgather/pod_logs.go`:
- Line 81: Update the JSON schema description for the "limit" parameter to
explicitly state it returns the first N matching lines (not the last), so
callers understand it behaves differently from grep's tail behavior; locate the
"limit" schema entry in pkg/toolsets/mustgather/pod_logs.go (the `"limit":
{Type: "integer", Description: ...}` entry) and change the Description to
something like "Maximum number of matching lines to return (first N matches,
default: 500)" to clearly indicate it returns the first matches.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0e9fc2f6-fb76-4956-a795-f523a3d7dacb

📥 Commits

Reviewing files that changed from the base of the PR and between 99db57c and 6a66cc7.

📒 Files selected for processing (1)

pkg/toolsets/mustgather/pod_logs.go

coderabbitai · 2026-04-06T21:58:42Z

+	for _, line := range lines {
+		if line == "" {
+			continue
+		}
+		// Each line starts with an RFC3339Nano timestamp followed by a space
+		spaceIdx := strings.IndexByte(line, ' ')
+		if spaceIdx == -1 {
+			continue
+		}
+		lineTime, err := time.Parse(time.RFC3339Nano, line[:spaceIdx])
+		if err != nil {
+			continue
+		}
+		if lineTime.Before(sinceTime) {
+			continue
+		}
+		if hasUntil && lineTime.After(untilTime) {
+			continue
+		}


⚠️ Potential issue | 🟡 Minor

Silently skipping lines without parseable timestamps may hide relevant data.

Lines without a valid RFC3339Nano prefix (e.g., multi-line stack traces, malformed entries, or logs collected without --timestamps) are silently dropped. Users won't know if data was lost. Consider either including non-timestamped lines when they immediately follow a matching timestamped line, or at minimum reporting how many lines were skipped.

💡 Suggested improvement to track skipped lines

lines := strings.Split(logs, "\n") var matchingLines []string + skippedLines := 0 for _, line := range lines { if line == "" { continue } // Each line starts with an RFC3339Nano timestamp followed by a space spaceIdx := strings.IndexByte(line, ' ') if spaceIdx == -1 { + skippedLines++ continue } lineTime, err := time.Parse(time.RFC3339Nano, line[:spaceIdx]) if err != nil { + skippedLines++ continue }

Then include skippedLines in the header if non-zero:

header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines)) + if skippedLines > 0 { + header = strings.TrimSuffix(header, "\n\n") + header += fmt.Sprintf(" (%d lines skipped due to missing/invalid timestamps)\n\n", skippedLines) + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/pod_logs.go` around lines 273 - 291, The loop in PodLogs that currently silently drops lines without parseable RFC3339Nano timestamps should count them and surface that metric: add a skippedLines int counter and increment it whenever you continue due to spaceIdx == -1 or time.Parse error (the variables in this block are lines, spaceIdx, lineTime, sinceTime, hasUntil, untilTime). After processing, include skippedLines in the log header (only when >0) so callers know how many lines were omitted; update the header-building code that emits the pod log metadata to append a "(skipped X non-timestamped lines)" note when skippedLines > 0. Ensure behavior otherwise unchanged.

openshift-ci-robot · 2026-04-07T05:51:09Z

@swghosh: This pull request references MG-216 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/toolsets/mustgather/pod_logs.go`:
- Around line 233-253: Validate the input early: after reading limit with getInt
(and before parsing times) return an error via api.NewToolCallResult when limit
is negative (require limit >= 0) and when both untilStr and sinceStr are
provided ensure untilTime >= sinceTime (i.e., after parsing sinceTime and
untilTime, if hasUntil and untilTime.Before(sinceTime) return
api.NewToolCallResult with a clear error like "until must be equal or after
since"); update the same validation later where limit and until are used (same
checks around the code paths that reference limit, sinceStr, untilStr,
sinceTime, untilTime, and hasUntil) so callers get a fast, descriptive error
instead of empty results.
- Around line 114-137: The header can be ambiguous when the caller passed an
empty container and the provider resolved it to the first container; update the
code so the actual container name used is surfaced in the header. Modify
GetPodLog (or its caller contract) to return the resolved container name
alongside logs (or expose the resolved container via mg.PodLogOptions/result),
then when building the header in pod_logs.go use that resolved container value
(not the original empty input) so the header prints "container <resolved-name>"
on multi-container pods; apply the same change to the other similar handlers
referenced (the blocks around 167-217 and 261-317) so all logs include the
actual container chosen.
- Around line 295-311: Record the total number of matches before truncation and
use that count in the header instead of the post-truncate slice length: store
totalMatches := len(matchingLines) before the limit logic in the pod_logs.go
function (where matchingLines and limit are handled), then apply the truncation
(matchingLines = matchingLines[:limit] if needed) and build header using
totalMatches for the "Found %d matching line(s)" message while keeping the
returned matchingLines slice limited.

---

Duplicate comments:
In `@pkg/toolsets/mustgather/pod_logs.go`:
- Around line 167-177: The code currently calls GetPodLog and then does
strings.Split(logs, "\n") (see GetPodLog and the lines variable) which buffers
the entire log into memory before applying tail/limit; change to a streaming
approach: add or use a reader-returning variant (e.g., GetPodLogStream or modify
GetPodLog to return an io.ReadCloser/reader from PodLogOptions) and replace the
strings.Split-based logic with a bufio.Scanner/reader that applies the filter,
tail and limit while scanning; update both occurrences (the block around the
GetPodLog call and the duplicate at lines 261-271) so they consume and filter
lines incrementally and close the reader when done.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 80d44601-e66d-434d-bea0-19b88272f14e

📥 Commits

Reviewing files that changed from the base of the PR and between 6a66cc7 and 2ab4bf2.

📒 Files selected for processing (2)

pkg/mcp/testdata/toolsets-openshift-mustgather-tools.json
pkg/toolsets/mustgather/pod_logs.go

coderabbitai · 2026-04-07T05:58:04Z

+	logs, err := p.GetPodLog(mg.PodLogOptions{
+		Namespace: namespace,
+		Pod:       pod,
+		Container: container,
+		LogType:   logType,
+		TailLines: tail,
+	})
+	if err != nil {
+		return api.NewToolCallResult("", fmt.Errorf("failed to get pod logs: %w", err)), nil
+	}
+
+	header := fmt.Sprintf("Logs for pod %s/%s", namespace, pod)
+	if container != "" {
+		header += fmt.Sprintf(", container %s", container)
+	}
+	if previous {
+		header += " (previous)"
+	}
+	if tail > 0 {
+		header += fmt.Sprintf(" (last %d lines)", tail)
+	}
+	header += ":\n\n"
+
+	return api.NewToolCallResult(header+logs, nil), nil


⚠️ Potential issue | 🟠 Major

Report the actual container when defaulting to one.

pkg/ocp/mustgather/provider.go:57-70 resolves an omitted container to the first available container, but these handlers do not surface which container was actually used. On multi-container pods that makes the response ambiguous and easy to misread.

💡 Suggested fix

+ if container == "" { + containers, err := p.ListPodContainers(namespace, pod) + if err != nil { + return api.NewToolCallResult("", fmt.Errorf("failed to resolve pod containers: %w", err)), nil + } + if len(containers) == 0 { + return api.NewToolCallResult("", fmt.Errorf("no containers found for pod %s/%s", namespace, pod)), nil + } + container = containers[0] + } + logs, err := p.GetPodLog(mg.PodLogOptions{ Namespace: namespace, Pod: pod, Container: container, @@ - if container != "" { - header += fmt.Sprintf(", container %s", container) - } + header += fmt.Sprintf(", container %s", container)

Also applies to: 167-217, 261-317

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/pod_logs.go` around lines 114 - 137, The header can be ambiguous when the caller passed an empty container and the provider resolved it to the first container; update the code so the actual container name used is surfaced in the header. Modify GetPodLog (or its caller contract) to return the resolved container name alongside logs (or expose the resolved container via mg.PodLogOptions/result), then when building the header in pod_logs.go use that resolved container value (not the original empty input) so the header prints "container <resolved-name>" on multi-container pods; apply the same change to the other similar handlers referenced (the blocks around 167-217 and 261-317) so all logs include the actual container chosen.

coderabbitai · 2026-04-07T05:58:04Z

+	limit := getInt(args, "limit", 500)
+
+	if namespace == "" || pod == "" {
+		return api.NewToolCallResult("", fmt.Errorf("namespace and pod are required")), nil
+	}
+	if sinceStr == "" {
+		return api.NewToolCallResult("", fmt.Errorf("since is required (RFC3339 format)")), nil
+	}
+
+	sinceTime, err := time.Parse(time.RFC3339, sinceStr)
+	if err != nil {
+		return api.NewToolCallResult("", fmt.Errorf("invalid since time format, expected RFC3339 (e.g. 2026-01-15T10:00:00Z): %w", err)), nil
+	}
+
+	var untilTime time.Time
+	hasUntil := untilStr != ""
+	if hasUntil {
+		untilTime, err = time.Parse(time.RFC3339, untilStr)
+		if err != nil {
+			return api.NewToolCallResult("", fmt.Errorf("invalid until time format, expected RFC3339 (e.g. 2026-01-15T12:00:00Z): %w", err)), nil
+		}


⚠️ Potential issue | 🟡 Minor

Reject invalid by-time inputs up front.

limit < 0 currently disables truncation, and until before since is reported as “no matching lines” instead of a caller error. Failing fast here keeps empty results meaningful.

🛠️ Suggested fix

limit := getInt(args, "limit", 500) + if limit < 0 { + return api.NewToolCallResult("", fmt.Errorf("limit must be >= 0")), nil + } @@ if hasUntil { untilTime, err = time.Parse(time.RFC3339, untilStr) if err != nil { return api.NewToolCallResult("", fmt.Errorf("invalid until time format, expected RFC3339 (e.g. 2026-01-15T12:00:00Z): %w", err)), nil } + if untilTime.Before(sinceTime) { + return api.NewToolCallResult("", fmt.Errorf("until must be greater than or equal to since")), nil + } }

Also applies to: 295-296

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/pod_logs.go` around lines 233 - 253, Validate the input early: after reading limit with getInt (and before parsing times) return an error via api.NewToolCallResult when limit is negative (require limit >= 0) and when both untilStr and sinceStr are provided ensure untilTime >= sinceTime (i.e., after parsing sinceTime and untilTime, if hasUntil and untilTime.Before(sinceTime) return api.NewToolCallResult with a clear error like "until must be equal or after since"); update the same validation later where limit and until are used (same checks around the code paths that reference limit, sinceStr, untilStr, sinceTime, untilTime, and hasUntil) so callers get a fast, descriptive error instead of empty results.

coderabbitai · 2026-04-07T05:58:04Z

+	if limit > 0 && len(matchingLines) > limit {
+		matchingLines = matchingLines[:limit]
+	}
+
+	header := fmt.Sprintf("Logs for pod %s/%s between %s and ", namespace, pod, sinceStr)
+	if hasUntil {
+		header += untilStr
+	} else {
+		header += "latest"
+	}
+	if container != "" {
+		header += fmt.Sprintf(", container %s", container)
+	}
+	if previous {
+		header += " (previous)"
+	}
+	header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines))


⚠️ Potential issue | 🟡 Minor

Keep the total match count separate from the limited output.

After limit truncates matchingLines, the header reports only the returned subset. With 5,000 matches and limit=500, callers are told there were 500 matches.

🛠️ Suggested fix

- if limit > 0 && len(matchingLines) > limit { + totalMatches := len(matchingLines) + if limit > 0 && totalMatches > limit { matchingLines = matchingLines[:limit] } @@ - header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines)) + header += fmt.Sprintf(":\n\nFound %d matching line(s)", totalMatches) + if limit > 0 && totalMatches > len(matchingLines) { + header += fmt.Sprintf(", returning first %d", len(matchingLines)) + } + header += "\n\n"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if limit > 0 && len(matchingLines) > limit {

matchingLines = matchingLines[:limit]

}

header := fmt.Sprintf("Logs for pod %s/%s between %s and ", namespace, pod, sinceStr)

if hasUntil {

header += untilStr

} else {

header += "latest"

}

if container != "" {

header += fmt.Sprintf(", container %s", container)

}

if previous {

header += " (previous)"

}

header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines))

if limit > 0 && len(matchingLines) > limit {

totalMatches := len(matchingLines)

matchingLines = matchingLines[:limit]

}

header := fmt.Sprintf("Logs for pod %s/%s between %s and ", namespace, pod, sinceStr)

if hasUntil {

header += untilStr

} else {

header += "latest"

}

if container != "" {

header += fmt.Sprintf(", container %s", container)

}

if previous {

header += " (previous)"

}

header += fmt.Sprintf(":\n\nFound %d matching line(s)", totalMatches)

if limit > 0 && totalMatches > len(matchingLines) {

header += fmt.Sprintf(", returning first %d", len(matchingLines))

}

header += "\n\n"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/pod_logs.go` around lines 295 - 311, Record the total number of matches before truncation and use that count in the header instead of the post-truncate slice length: store totalMatches := len(matchingLines) before the limit logic in the pod_logs.go function (where matchingLines and limit are handled), then apply the truncation (matchingLines = matchingLines[:limit] if needed) and build header using totalMatches for the "Found %d matching line(s)" message while keeping the returned matchingLines slice limited.

Cali0707 · 2026-04-16T19:13:54Z

+
+// ReadETCDFile reads a raw ETCD info JSON file and returns its content
+func (p *Provider) ReadETCDFile(filename string) ([]byte, error) {
+	filePath := filepath.Join(p.metadata.ContainerDir, "etcd_info", filename)


@swghosh can we add a little defensive check here on the filePath that is constructed to ensure that nothing is escaping the etcd_info path on the container?

This is causing the security scan action to fail even though all the call sites currently use hard coded paths that are safe, as this is an exported method

Cali0707 · 2026-04-16T19:16:15Z

+	providerMu sync.RWMutex
+	provider   *mg.Provider


@swghosh how will this work when more than one user connects to the MCP server and tries to check different must gathers?

Is that a valid use case? If not, can we guard against it somehow?

I'm worried about two users calling mustgather_use with different paths

Maybe we can cache these per session (should be on the ToolHandlerParams context iirc)

Yup, that's a very valid use case.
We'd need to cache the must-gather path per session.

Cali0707 · 2026-04-16T19:19:13Z

+	// Load metadata files
+	loadMetadata(containerDir, &metadata)
+
+	// Load all resources
+	var resources []*unstructured.Unstructured
+
+	clusterResources, err := loadClusterScopedResources(containerDir)
+	if err == nil {
+		resources = append(resources, clusterResources...)
+	}
+
+	namespacedResources, err := loadNamespacedResources(containerDir)
+	if err == nil {


do we maybe want to estimate how much memory the loading of resources will consume before doing so/compare against how much memory the pod has? Some must gathers can be quite large - this may crash the pod with OOM

Also (or instead) is it possible to lazy load this, or does that not make sense?

lazy loading makes sense - also i'm wondering why we need to pre-load @swghosh . have you noticed any issues getting data on demand?

Cali0707 · 2026-04-16T19:24:24Z

+		if n > 0 {
+			sb.Write(buf[:n])
+		}
+		if readErr != nil {


maybe we want to verify that this is actually an io.EOF error instead of some other kind of error so that we don't silently skip other errors

Cali0707 · 2026-04-16T19:27:18Z

+	nsSet := make(map[string]struct{})
+	for ns := range idx.byNamespace {
+		nsSet[ns] = struct{}{}
+	}
+	idx.namespaces = make([]string, 0, len(nsSet))
+	for ns := range nsSet {
+		idx.namespaces = append(idx.namespaces, ns)
+	}


can't we simplify this to just iterate over the keys in idx.byNamespace? As far as I can tell the nsSet is a map that has the same keys as idx.byNamespace so not sure why we are creating this here

Cali0707 · 2026-04-16T19:28:08Z

+// healthSymbol returns a symbol for health status
+func healthSymbol(health string) string {
+	switch strings.ToLower(health) {
+	case "up", "healthy", "ok", "true", "firing":


@swghosh you are the expert but maybe we want to move "firing" to "[WARN]" ?

Cali0707 · 2026-04-16T19:29:33Z

-					Prompts:   &mcp.PromptCapabilities{ListChanged: !configuration.Stateless},
-					Tools:     &mcp.ToolCapabilities{ListChanged: !configuration.Stateless},
-					Logging:   &mcp.LoggingCapabilities{},
+					Prompts: &mcp.PromptCapabilities{ListChanged: !configuration.Stateless},


Let's not change this line if not needed

Cali0707 · 2026-04-16T19:30:29Z

+// ResourceRegistrar allows tools to register MCP resources at runtime.
+// For example, the mustgather_use tool registers a must-gather:// resource
+// after loading an archive.
+type ResourceRegistrar interface {
+	AddResource(uri, name, description, mimeType, content string)
+	RemoveResources(uris ...string)
+}
+


This type of change will need to be considered upstream first, we don't want to change this downstream

Added a PR upstream for the MCP resource and templates registration boilerplate,
containers#1091

Cali0707 · 2026-04-16T19:31:50Z

+	"path/filepath"
+	"strings"
+
+	"gopkg.in/yaml.v3"


maybe sigs.k8s.io/yaml like we use in other places? this seems to be promoting the other yaml dep from indirect to direct

Updated to use sigs.k8s.io/yaml instead.

swghosh · 2026-04-22T19:06:24Z

@Cali0707 whenever you're around with this, PTAL at another proof-of-concept MCP resources server that I've been playing around with: https://github.com/swghosh/mcp-x-mg/tree/main/mustgather-resource-mcp-server#mcp-resources.
It aligns better with code mode style approaches, please feel free to drop your feedback around it here or over Slack.

openshift-ci-robot · 2026-04-22T19:41:15Z

@swghosh: This pull request references MG-216 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

pkg/mcp/resource_test.go (1)

62-65: Prefer fail-fast assertions for captured registrars.

After Line 63/Line 64, consider asserting s.resourceReg and s.templateReg are non-nil to make wiring failures explicit before subtests.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/mcp/resource_test.go` around lines 62 - 65, The test calls
s.CallTool("test_init", ...) to populate registrars but doesn't assert they were
actually set; add fail-fast assertions immediately after that call to ensure
s.resourceReg and s.templateReg are non-nil so wiring failures fail the test
early. Locate the initialization call (s.CallTool("test_init",
map[string]any{})) and add assertions checking s.resourceReg != nil and
s.templateReg != nil (use the test suite's Require/Assert helpers) before
proceeding to subtests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/mcp/resource.go`:
- Around line 10-15: There are two conflicting declarations of type
resourceRegistrar (used to implement api.ResourceRegistrar) in this package;
remove one to avoid duplicate-type compile errors and keep the implementation
that preserves the safer closure capture pattern (resourceContent := content)
used in the gosdk.go variant. Delete either the resourceRegistrar block in
pkg/mcp/resource.go or in pkg/mcp/gosdk.go, then ensure the remaining
resourceRegistrar struct references server *mcp.Server and implements the same
methods, and that any handler/closure code uses the resourceContent := content
pattern to avoid closure capture bugs.

In `@pkg/toolsets/mustgather/nodes.go`:
- Around line 230-242: The header currently uses len(matchingLines) after the
tail truncation, so it shows the tailed count instead of the total matches;
capture the total match count before you apply the tail (e.g. totalMatches :=
len(matchingLines) immediately before the tail logic) and then use totalMatches
in the header formatting instead of len(matchingLines); adjust the existing
block around matchingLines, tail, header, node, filter, and caseInsensitive in
nodes.go so the "Found X matching line(s)" reflects the original total while the
displayed matchingLines remain tailed.
- Around line 125-137: The sections slice defines two entries with the same key
"pods", making the second entry (title "Pod Resources", value diag.PodResources)
unreachable when filtering by shouldInclude; update the key for the Pod
Resources entry to a unique identifier such as "podresources" (or another
distinct string) and update any related schema/description to reflect the new
key so that shouldInclude("podresources") will correctly include
diag.PodResources.

---

Nitpick comments:
In `@pkg/mcp/resource_test.go`:
- Around line 62-65: The test calls s.CallTool("test_init", ...) to populate
registrars but doesn't assert they were actually set; add fail-fast assertions
immediately after that call to ensure s.resourceReg and s.templateReg are
non-nil so wiring failures fail the test early. Locate the initialization call
(s.CallTool("test_init", map[string]any{})) and add assertions checking
s.resourceReg != nil and s.templateReg != nil (use the test suite's
Require/Assert helpers) before proceeding to subtests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b0c259ea-bea9-4a6a-acd1-a13998bcfbd9

📥 Commits

Reviewing files that changed from the base of the PR and between 2ab4bf2 and 7604f45.

📒 Files selected for processing (26)

internal/test/mcp.go
internal/tools/update-readme/main.go
pkg/api/toolsets.go
pkg/mcp/gosdk.go
pkg/mcp/mcp.go
pkg/mcp/modules.go
pkg/mcp/resource.go
pkg/mcp/resource_test.go
pkg/mcp/testdata/toolsets-openshift-mustgather-tools.json
pkg/mcp/toolsets_test.go
pkg/ocp/mustgather/etcd.go
pkg/ocp/mustgather/index.go
pkg/ocp/mustgather/loader.go
pkg/ocp/mustgather/monitoring.go
pkg/ocp/mustgather/provider.go
pkg/ocp/mustgather/types.go
pkg/toolsets/mustgather/etcd.go
pkg/toolsets/mustgather/events.go
pkg/toolsets/mustgather/helpers.go
pkg/toolsets/mustgather/monitoring.go
pkg/toolsets/mustgather/namespaces.go
pkg/toolsets/mustgather/nodes.go
pkg/toolsets/mustgather/pod_logs.go
pkg/toolsets/mustgather/resources.go
pkg/toolsets/mustgather/toolset.go
pkg/toolsets/mustgather/use.go

✅ Files skipped from review due to trivial changes (5)

pkg/mcp/toolsets_test.go
pkg/toolsets/mustgather/toolset.go
pkg/mcp/modules.go
pkg/ocp/mustgather/types.go
pkg/mcp/testdata/toolsets-openshift-mustgather-tools.json

🚧 Files skipped from review as they are similar to previous changes (10)

pkg/toolsets/mustgather/namespaces.go
pkg/toolsets/mustgather/use.go
internal/tools/update-readme/main.go
pkg/mcp/mcp.go
pkg/ocp/mustgather/monitoring.go
pkg/toolsets/mustgather/helpers.go
pkg/toolsets/mustgather/monitoring.go
pkg/toolsets/mustgather/events.go
pkg/ocp/mustgather/etcd.go
pkg/toolsets/mustgather/resources.go

coderabbitai · 2026-04-22T19:45:05Z

+	sections := []struct {
+		key, title, content string
+	}{
+		{"sysinfo", "System Info", diag.SysInfo},
+		{"lscpu", "CPU Info (lscpu)", diag.Lscpu},
+		{"cpu", "CPU Affinities", diag.CPUAffinities},
+		{"irq", "IRQ Affinities", diag.IRQAffinities},
+		{"lspci", "PCI Devices (lspci)", diag.Lspci},
+		{"dmesg", "Kernel Messages (dmesg)", diag.Dmesg},
+		{"cmdline", "Kernel Boot Parameters", diag.ProcCmdline},
+		{"pods", "Pods Info", diag.PodsInfo},
+		{"pods", "Pod Resources", diag.PodResources},
+	}


⚠️ Potential issue | 🟡 Minor

Duplicate key "pods" causes one section to be unreachable.

Lines 135 and 136 both use "pods" as the key. When a user passes include=pods, only the first matching section (Pods Info) is included because shouldInclude("pods") returns the same result for both. Either Pod Resources should use a different key (e.g., "podresources"), or the description should clarify this behavior.

{"cmdline", "Kernel Boot Parameters", diag.ProcCmdline}, {"pods", "Pods Info", diag.PodsInfo}, - {"pods", "Pod Resources", diag.PodResources}, + {"podresources", "Pod Resources", diag.PodResources}, }

Update the schema description accordingly:

- "include": {Type: "string", Description: "Comma-separated diagnostics to include: kubelet,sysinfo,cpu,irq,pods,lscpu,lspci,dmesg,cmdline (default: all)"}, + "include": {Type: "string", Description: "Comma-separated diagnostics to include: kubelet,sysinfo,cpu,irq,pods,podresources,lscpu,lspci,dmesg,cmdline (default: all)"},

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sections := []struct {

key, title, content string

}{

{"sysinfo", "System Info", diag.SysInfo},

{"lscpu", "CPU Info (lscpu)", diag.Lscpu},

{"cpu", "CPU Affinities", diag.CPUAffinities},

{"irq", "IRQ Affinities", diag.IRQAffinities},

{"lspci", "PCI Devices (lspci)", diag.Lspci},

{"dmesg", "Kernel Messages (dmesg)", diag.Dmesg},

{"cmdline", "Kernel Boot Parameters", diag.ProcCmdline},

{"pods", "Pods Info", diag.PodsInfo},

{"pods", "Pod Resources", diag.PodResources},

}

sections := []struct {

key, title, content string

}{

{"sysinfo", "System Info", diag.SysInfo},

{"lscpu", "CPU Info (lscpu)", diag.Lscpu},

{"cpu", "CPU Affinities", diag.CPUAffinities},

{"irq", "IRQ Affinities", diag.IRQAffinities},

{"lspci", "PCI Devices (lspci)", diag.Lspci},

{"dmesg", "Kernel Messages (dmesg)", diag.Dmesg},

{"cmdline", "Kernel Boot Parameters", diag.ProcCmdline},

{"pods", "Pods Info", diag.PodsInfo},

{"podresources", "Pod Resources", diag.PodResources},

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/nodes.go` around lines 125 - 137, The sections slice defines two entries with the same key "pods", making the second entry (title "Pod Resources", value diag.PodResources) unreachable when filtering by shouldInclude; update the key for the Pod Resources entry to a unique identifier such as "podresources" (or another distinct string) and update any related schema/description to reflect the new key so that shouldInclude("podresources") will correctly include diag.PodResources.

coderabbitai · 2026-04-22T19:45:05Z

+	// Apply tail from the end of matches
+	if tail > 0 && len(matchingLines) > tail {
+		matchingLines = matchingLines[len(matchingLines)-tail:]
+	}
+
+	header := fmt.Sprintf("Kubelet logs for node %s filtered by '%s'", node, filter)
+	if caseInsensitive {
+		header += " (case-insensitive)"
+	}
+	if tail > 0 {
+		header += fmt.Sprintf(" (last %d matches)", tail)
+	}
+	header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines))


⚠️ Potential issue | 🟡 Minor

Match count header shows tailed count, not total matches.

After applying tail truncation on Lines 231-232, Line 242 reports len(matchingLines) which is the tailed count. If there were 500 matches and tail=10, users see "Found 10 matching line(s)" instead of the actual 500.

Suggested fix

+ totalMatches := len(matchingLines) // Apply tail from the end of matches if tail > 0 && len(matchingLines) > tail { matchingLines = matchingLines[len(matchingLines)-tail:] } header := fmt.Sprintf("Kubelet logs for node %s filtered by '%s'", node, filter) if caseInsensitive { header += " (case-insensitive)" } if tail > 0 { header += fmt.Sprintf(" (last %d matches)", tail) } - header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines)) + header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", totalMatches)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Apply tail from the end of matches

if tail > 0 && len(matchingLines) > tail {

matchingLines = matchingLines[len(matchingLines)-tail:]

}

header := fmt.Sprintf("Kubelet logs for node %s filtered by '%s'", node, filter)

if caseInsensitive {

header += " (case-insensitive)"

}

if tail > 0 {

header += fmt.Sprintf(" (last %d matches)", tail)

}

header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines))

totalMatches := len(matchingLines)

// Apply tail from the end of matches

if tail > 0 && len(matchingLines) > tail {

matchingLines = matchingLines[len(matchingLines)-tail:]

}

header := fmt.Sprintf("Kubelet logs for node %s filtered by '%s'", node, filter)

if caseInsensitive {

header += " (case-insensitive)"

}

if tail > 0 {

header += fmt.Sprintf(" (last %d matches)", tail)

}

header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", totalMatches)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/nodes.go` around lines 230 - 242, The header currently uses len(matchingLines) after the tail truncation, so it shows the tailed count instead of the total matches; capture the total match count before you apply the tail (e.g. totalMatches := len(matchingLines) immediately before the tail logic) and then use totalMatches in the header formatting instead of len(matchingLines); adjust the existing block around matchingLines, tail, header, node, filter, and caseInsensitive in nodes.go so the "Found X matching line(s)" reflects the original total while the displayed matchingLines remain tailed.

bentito

This is a big PR, but it introduces highly valuable and much-needed capabilities for offline RCA. I fully support merging this soon and mostly as-is to get the baseline functionality in place. We can address the architectural and performance suggestions below as follow-up work.

Only my comments below about concurrency on the package variables maybe need addressing pre-merge:

Particularly, as a follow-up, we need to ensure that our recent injection ideas for the netedge toolset integrate smoothly with this offline mode after it merges.

Here is a technical review and suggestions for future iterations, specifically looking at this from an AI agent's perspective:

1. Agent Workflow: The `use` Initialization Step

From an agent's perspective, forcing the LLM to explicitly call mustgather_use before accessing any other tools creates a brittle, linear workflow. Agents frequently forget prerequisite setup steps, requiring extra turns to recover from errors.

Suggestion for follow-up: Allow initializing the must-gather provider as a check-step internally for all the other must-gather tool calls. If it's not done, just do it silently when needed.

2. Concurrency and State Management

Currently, the loaded archive state is stored in a package-level global variable:

// pkg/toolsets/mustgather/helpers.go
var provider *mg.Provider

Suggestion for change: In a multi-client HTTP/SSE deployment, if User A calls mustgather_use(path="/archive1") and User B calls mustgather_use(path="/archive2"), User B will overwrite the global state, breaking User A's session. Moving to the startup-configuration approach mentioned above (tying the server instance to a specific archive) or implementing session-scoped state would resolve this race condition safely.

3. Memory Efficiency in Log Processing

The log filtering tools (mustgather_pod_logs_grep and mustgather_pod_logs_by_time) currently read the entire log file into memory using readTextFile before splitting and iterating.

Suggestion for follow-up: Because must-gather pod logs can easily reach hundreds of megabytes, this approach risks significant memory spikes or OOM kills on the MCP server. Refactoring these to use a bufio.Scanner to stream the file line-by-line from disk—applying the regex/time filters during the scan and only retaining the matched ines - would make this significantly more robust.

Great work on the core parsing and indexing logic! Let's get this merged and iterate on the optimizations.

openshift-ci · 2026-05-03T19:37:49Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bentito, swghosh
Once this PR has been reviewed and has the lgtm label, please assign manusa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-05-05T08:24:33Z

New changes are detected. LGTM label has been removed.

* Add MCP resources to Toolset interface Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * Add MCP resource capabilities to Server Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * React to changes in the Toolset interface Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * Update suggestions encountered during review Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * Update suggestions from review Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * Add more test coverage for MCP resources (assertion pairing in resource_test.go, Blob/MIMEType-override coverage, filter/mutator tests, AddResource URI-validation panic) Signed-off-by: Swarup Ghosh <swghosh@redhat.com> --------- Signed-off-by: Swarup Ghosh <swghosh@redhat.com>

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Swarup Ghosh <swghosh@redhat.com> Add mustgather_events_by_time tool Add mustgather_pod_logs_grep tool Add mustgather_pod_logs_by_time tool Signed-off-by: Swarup Ghosh <swghosh@redhat.com> Update to use MCP resources Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Swarup Ghosh <swghosh@redhat.com>

openshift-ci · 2026-05-05T17:19:25Z

@swghosh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/lint	`95aa015`	link	true	`/test lint`
ci/prow/test	`95aa015`	link	true	`/test test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 25, 2026

Prashanth684 reviewed Feb 26, 2026

View reviewed changes

openshift-ci Bot requested a review from bentito February 26, 2026 20:55

swghosh changed the title ~~WIP: add "openshift/mustgather" toolset~~ [WIP] MG-216: add "openshift/mustgather" toolset Feb 27, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 27, 2026

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 4, 2026

swghosh force-pushed the wg-must-gather-mcp-tools branch from f10d877 to c3cd32c Compare April 6, 2026 21:25

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2026

swghosh marked this pull request as ready for review April 6, 2026 21:26

swghosh changed the title ~~[WIP] MG-216: add "openshift/mustgather" toolset~~ MG-216: add "openshift/mustgather" toolset Apr 6, 2026

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 6, 2026

openshift-ci Bot requested review from Kaustubh-pande and matzew April 6, 2026 21:26

swghosh changed the title ~~MG-216: add "openshift/mustgather" toolset~~ MG-216: add openshift/mustgather toolset Apr 6, 2026

coderabbitai Bot reviewed Apr 6, 2026

View reviewed changes

swghosh force-pushed the wg-must-gather-mcp-tools branch from 6a66cc7 to 2ab4bf2 Compare April 7, 2026 05:50

coderabbitai Bot reviewed Apr 7, 2026

View reviewed changes

Cali0707 reviewed Apr 16, 2026

View reviewed changes

swghosh mentioned this pull request Apr 22, 2026

feat(api): add MCP resource containers/kubernetes-mcp-server#1091

Merged

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2026

swghosh force-pushed the wg-must-gather-mcp-tools branch from 2ab4bf2 to 7604f45 Compare April 22, 2026 19:37

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2026

swghosh force-pushed the wg-must-gather-mcp-tools branch from 7604f45 to 2d6467c Compare April 22, 2026 19:43

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

swghosh force-pushed the wg-must-gather-mcp-tools branch from 2d6467c to ac8be42 Compare April 22, 2026 19:50

bentito approved these changes May 3, 2026

View reviewed changes

openshift-ci Bot assigned bentito May 3, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 3, 2026

swghosh force-pushed the wg-must-gather-mcp-tools branch from ac8be42 to 1ef296e Compare May 5, 2026 08:24

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 5, 2026

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2026

swghosh and others added 2 commits May 5, 2026 22:38

swghosh force-pushed the wg-must-gather-mcp-tools branch from 1ef296e to 95aa015 Compare May 5, 2026 17:11

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2026

		"limit": {Type: "integer", Description: "Maximum number of events to return (default: 100)"},
		},

Conversation

swghosh commented Feb 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Feb 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cali0707 commented Feb 26, 2026

Uh oh!

openshift-ci-robot commented Feb 27, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-merge-robot commented Mar 4, 2026

Uh oh!

coderabbitai Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

openshift-ci-robot commented Apr 6, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Apr 6, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

swghosh commented Feb 25, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Feb 27, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Apr 6, 2026 •

edited

Loading

openshift-ci-robot commented Apr 6, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 6, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 6, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 7, 2026 •

edited by openshift-ci Bot

Loading