MG-216: add openshift/mustgather toolset#159
Conversation
|
Skipping CI for Draft Pull Request. |
| // FindContainerDir locates the must-gather container directory | ||
| // (e.g., quay-io-okd-scos-content-sha256-...) | ||
| func FindContainerDir(basePath string) (string, error) { | ||
| entries, err := os.ReadDir(basePath) |
There was a problem hiding this comment.
there are a few considerations regarding where the must-gather can be stored:
- if the server is running locally in a dev setup, it can access the must-gather from the fs
- if it is running on the target cluster, we should probably make sure it can be read from a pv
- the last one is variable where the must-gather needs to be accessed remotely. This will involve further discussions on where the must-gather from customer cases are stored and how they can be accessed.
|
/cc @bentito Iirc you were also interested in the offline bits |
|
@swghosh: This pull request references MG-216 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
f10d877 to
c3cd32c
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis pull request introduces the Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Client
participant MCP Server
participant Toolset
participant Provider
participant Archive
User->>Client: Call mustgather_use(path)
Client->>MCP Server: ToolCall mustgather_use
MCP Server->>Toolset: Execute handler with ResourceRegistrar
Toolset->>Archive: Load archive from path
Archive-->>Provider: Create Provider with loaded resources
Toolset->>Provider: GetMetadata()
Provider-->>Toolset: metadata (version, timestamp, counts)
Toolset->>MCP Server: ResourceRegistrar.AddResource(must-gather://...)
MCP Server->>MCP Server: Store resource with handler
Toolset-->>Client: Success response with archive info
User->>Client: Call mustgather_pod_logs_get(namespace, pod)
Client->>MCP Server: ToolCall mustgather_pod_logs_get
MCP Server->>Toolset: Execute handler
Toolset->>Provider: GetPodLog(namespace, pod)
Provider->>Archive: Read log file from pod directory
Archive-->>Provider: Log content
Provider-->>Toolset: Log content string
Toolset-->>Client: Formatted log output
User->>Client: ReadResource(must-gather://...)
Client->>MCP Server: ReadResource request
MCP Server->>MCP Server: Invoke resource handler
MCP Server-->>Client: Archive metadata content
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
openshift/mustgather toolset
|
@swghosh: This pull request references MG-216 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (7)
pkg/toolsets/mustgather/etcd.go (1)
173-175: Add deterministic tie-break for equal counts.Equal counts currently follow map iteration order. Adding a secondary sort key (
resource) makes output stable across runs.♻️ Proposed refinement
sort.Slice(entries, func(i, j int) bool { - return entries[i].count > entries[j].count + if entries[i].count == entries[j].count { + return entries[i].resource < entries[j].resource + } + return entries[i].count > entries[j].count })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/etcd.go` around lines 173 - 175, Current sort of entries uses only count so ties are non-deterministic; update the comparator passed to sort.Slice (the anonymous func used with entries) to break ties by the resource field: return entries[i].count > entries[j].count || (entries[i].count == entries[j].count && entries[i].resource < entries[j].resource). This ensures deterministic, stable ordering for equal counts while keeping primary sort by count.pkg/toolsets/mustgather/use.go (1)
56-73: Consider removing previously registered must-gather resource before adding a new one.When
mustgather_useis called multiple times with different paths, the oldmust-gather://<old-path>resource remains registered alongside the new one. This could lead to stale resources accumulating in the MCP server.♻️ Proposed fix to clean up old resource
// Register MCP resource if params.ResourceRegistrar != nil { resourceURI := "must-gather://" + path + // Remove any previously registered must-gather resource + // (simple approach: remove the exact URI being registered) + params.ResourceRegistrar.RemoveResources(resourceURI) content := fmt.Sprintf("Must-Gather Archive\n"+Alternatively, track the previously registered URI in package state and remove it before registering the new one.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/use.go` around lines 56 - 73, The package currently registers a new must-gather resource via params.ResourceRegistrar.AddResource without removing any previously registered must-gather URI, causing stale entries; add logic in mustgather_use to remove the prior registered resource before adding the new one by tracking the last registered URI in package state (e.g., a package-level variable like lastMustGatherURI) and call params.ResourceRegistrar.RemoveResource(lastMustGatherURI) if non-empty, then update lastMustGatherURI to the new resourceURI after successful AddResource; if concurrent calls are possible, protect the package-level tracker with a mutex to avoid races.pkg/toolsets/mustgather/events.go (1)
110-115: String comparison for timestamps works but is fragile.The sorting relies on string comparison of
lastTimestamp(e.g.,"2024-01-15T10:30:00Z" > "2024-01-14T10:30:00Z"). This works for ISO 8601 / RFC 3339 timestamps, which Kubernetes uses, but adding a comment would clarify this assumption.📝 Suggested documentation
// Sort by lastTimestamp descending + // String comparison works correctly for RFC 3339 timestamps (lexicographic order = chronological order) sort.Slice(filtered, func(i, j int) bool {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/events.go` around lines 110 - 115, The sort block that uses sort.Slice(filtered, ...) and compares ti and tj from unstructured.NestedString(filtered[i].Object, "lastTimestamp") relies on RFC3339/ISO‑8601 lexical ordering; add a concise comment above this sort explaining that Kubernetes timestamps are RFC3339 and therefore string comparison yields correct chronological ordering, and optionally note that if non‑RFC3339 input is expected the code should parse to time.Time (time.Parse(time.RFC3339)) before comparing for robustness.pkg/toolsets/mustgather/helpers.go (1)
125-135: Mapping "firing" to[OK]may be semantically misleading.In
healthSymbol, the status "firing" (typically indicating an active alert) is mapped to[OK], which could be confusing when displaying alert states. Consider whether "firing" should map to[WARN]or[FAIL]instead, or document this intentional mapping.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/helpers.go` around lines 125 - 135, The healthSymbol function maps "firing" to "[OK]", which is misleading; update the switch in healthSymbol so the "firing" string is removed from the OK case and instead handled as a warning (map "firing" to "[WARN]"), and update any related unit tests or callers that expect the old behavior; locate the healthSymbol function and change the case membership accordingly so "firing" falls through to the WARN branch (or add an explicit case returning "[WARN]").pkg/toolsets/mustgather/monitoring.go (2)
395-398: Fragile string manipulation for inserting summary.Using
strings.Replaceto insert the summary after the first"\n\n"is brittle and could break if the header format changes. Consider building the output with the summary in the correct position from the start.♻️ Suggested approach
- output := "## Prometheus Rules\n\n" - if typeFilter != "" { - output += fmt.Sprintf("Filter: %s rules only\n\n", typeFilter) - } + var headerParts []string + headerParts = append(headerParts, "## Prometheus Rules") // ... counting logic ... - summary := fmt.Sprintf("Total: %d groups, %d rules\n\n", totalGroups, totalRules) - // Insert summary after header - output = strings.Replace(output, "\n\n", "\n\n"+summary, 1) + output := "## Prometheus Rules\n\n" + output += fmt.Sprintf("Total: %d groups, %d rules\n\n", totalGroups, totalRules) + if typeFilter != "" { + output += fmt.Sprintf("Filter: %s rules only\n\n", typeFilter) + } + // ... rest of output buildingThis requires restructuring to count first, then build output.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/monitoring.go` around lines 395 - 398, The current insertion of summary using strings.Replace on output is brittle; instead compute totalGroups/totalRules first and construct the final output explicitly rather than mutating with Replace: build the header, then append the summary (using the summary variable you already format), then append the remainder of the formatted rules/body into output so you no longer rely on finding "\n\n" in output (update the code paths that set summary and output around the variables summary, totalGroups, totalRules and remove strings.Replace usage).
494-526: Consider consolidating duplicate formatting functions.
formatNameValueSectionandformatNameValueBytesSectionhave nearly identical logic, differing only in the value formatter (formatNumbervsformatBytes). Consider a single function with a formatter parameter.♻️ Proposed consolidation
+// formatNameValueSectionWith formats a list of name-value pairs using the provided value formatter +func formatNameValueSectionWith(title string, items []mg.NameValue, limit int, formatValue func(int64) string) string { + output := fmt.Sprintf("#### %s\n\n", title) + if len(items) == 0 { + return output + "No data available.\n\n" + } + count := len(items) + if limit > 0 && count > limit { + count = limit + } + for i := 0; i < count; i++ { + output += fmt.Sprintf("%-60s %s\n", truncate(items[i].Name, 60), formatValue(items[i].Value)) + } + output += "\n" + return output +} + // formatNameValueSection formats a list of name-value pairs as a section func formatNameValueSection(title string, items []mg.NameValue, limit int) string { - // ... current implementation ... + return formatNameValueSectionWith(title, items, limit, formatNumber) } // formatNameValueBytesSection formats a list of name-value pairs with byte formatting func formatNameValueBytesSection(title string, items []mg.NameValue, limit int) string { - // ... current implementation ... + return formatNameValueSectionWith(title, items, limit, formatBytes) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/monitoring.go` around lines 494 - 526, The two nearly identical functions formatNameValueSection and formatNameValueBytesSection should be consolidated into a single function that accepts a value formatter parameter; implement a new function (e.g., formatNameValueSectionWithFormatter(title string, items []mg.NameValue, limit int, formatValue func(int64) string)) that performs the shared header/empty-check/limit/truncate loop and calls formatValue(items[i].Value) for formatting, then replace calls to formatNameValueSection and formatNameValueBytesSection to use this new function with formatNumber and formatBytes respectively and remove the duplicated functions.pkg/ocp/mustgather/loader.go (1)
132-144: Stop traversing pod log trees during resource discovery.The
pods/...guard currently runs only for files, because directories return on Line 136 before that check executes. On large must-gathers this still walks every pod log path undernamespaces/*/pods, which can dominate load time even though none of those files will ever be parsed as resources.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/ocp/mustgather/loader.go` around lines 132 - 144, The walker callback in the filepath.Walk call is currently returning on directories before checking for the "pods/..." prefix, causing the walker to descend into all pod log subtrees; update the anonymous callback used in filepath.Walk (the func(path string, info os.FileInfo, err error) error) to compute rel := filepath.Rel(nsDir, path) early and, if info.IsDir() && strings.HasPrefix(rel, "pods"+string(filepath.Separator)), return filepath.SkipDir to prevent descending into pod log directories; keep the existing file-suffix checks for .yaml/.yml for regular files.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/mcp/mcp.go`:
- Around line 87-90: The Initialize response doesn't advertise the Resources
capability so clients can't discover resources added later via AddResource();
update the initialization capability block (where Prompts, Tools, Logging are
set) to include Resources by adding a Resources field using
mcp.ResourceCapabilities (e.g., set ListChanged based on configuration.Stateless
similar to Prompts/Tools) so the server pre-advertises resource support;
reference the Initialize handling logic that relies on s.resources.len() and the
AddResource() method to ensure clients will know Resources can be registered
dynamically.
In `@pkg/ocp/mustgather/etcd.go`:
- Around line 78-83: ReadETCDFile currently joins filename into
p.metadata.ContainerDir + "etcd_info" directly which allows path traversal;
validate and sanitize filename before reading: reject absolute paths
(filepath.IsAbs), reject any filename that after filepath.Clean contains ".."
path segments or path separators, or alternatively construct the absolute path
and verify it is inside the etcd_info base using filepath.Rel to ensure the
resulting filePath is within the base (p.metadata.ContainerDir + "etcd_info")
before calling os.ReadFile; update ReadETCDFile to perform these checks and
return an error for invalid traversal attempts.
In `@pkg/ocp/mustgather/index.go`:
- Around line 171-217: matchesLabelSelector and matchesFieldSelector currently
ignore malformed clauses and unsupported fields which widens matches; change
them to reject invalid selectors by returning false on any malformed part (e.g.,
kv len != 2 or empty key/value) and in matchesFieldSelector treat unknown keys
(not metadata.name/metadata.namespace) as invalid and return false instead of
continuing; update both functions (matchesLabelSelector and
matchesFieldSelector) to validate each clause strictly and fail-fast when a
clause is malformed or unsupported so callers receive no-match for bad
selectors.
- Around line 105-139: The List method in ResourceIndex builds result.Items and
truncates by opts.Limit without a deterministic ordering, causing
non-reproducible subsets; after populating result.Items (but before applying the
Limit) sort it deterministically (e.g. stable sort by GetNamespace(), then
GetName(), then GetUID() or creation timestamp) so repeated calls return the
same first N items; update ResourceIndex.List to perform this stable sort on
result.Items prior to the existing limit logic referenced by symbols
ResourceIndex.List, result.Items, matchesListOptions, and opts.Limit.
In `@pkg/ocp/mustgather/loader.go`:
- Around line 21-47: Load currently masks IO errors: in Load, do not treat all
FindContainerDir errors as ignorable—only tolerate the specific "container
directory not found" case; for any other error returned by FindContainerDir
return it immediately. Also stop swallowing errors from
loadClusterScopedResources and loadNamespacedResources: if either returns a
non-nil error, propagate that error from Load (do not append nil result or
continue). Update the logic in Load (referencing FindContainerDir, Load,
loadClusterScopedResources, loadNamespacedResources, and loadMetadata) so only
the explicit "container dir not found" condition is ignored and all other errors
are returned.
In `@pkg/ocp/mustgather/monitoring.go`:
- Around line 119-127: GetReplicaNumbers currently treats any non-exact match as
"all" which hides bad input; change the switch on replicaParam in
GetReplicaNumbers to only return []int{0} for "prometheus-k8s-0" or "0" and
[]int{1} for "prometheus-k8s-1" or "1", and make the default branch return nil
(or an empty slice) to signal invalid input instead of returning both replicas;
also update any callers of GetReplicaNumbers to detect len==0 (or nil) and
surface/return a clear validation error indicating the replicaParam was unknown.
In `@pkg/ocp/mustgather/provider.go`:
- Around line 176-201: The readGzipFile function currently swallows read errors
and always returns nil error; update read loop in readGzipFile to treat io.EOF
as normal (break) but return any other non-nil readErr to the caller (after
writing any bytes read), so decompression/truncation failures propagate; locate
the function readGzipFile and modify the loop to check readErr (if readErr ==
io.EOF break else return "", readErr).
- Around line 205-210: TailLines currently returns an empty string when content
ends with a newline because Split produces a trailing empty element; update
TailLines to trim trailing newline characters from content (e.g., use
strings.TrimRight(content, "\n")) before splitting or otherwise ignore the final
empty element so that TailLines("a\nb\n", 1) returns "b"; apply the change
inside the TailLines function to preserve existing behavior for
non-newline-terminated input.
In `@pkg/toolsets/mustgather/etcd.go`:
- Around line 155-197: The code accepts negative limit values from getInt which
can lead to misleading output (e.g., "top -1"); clamp and normalize limit
immediately after it's read: capture the raw value from getInt into a variable
(limit), then if limit < 0 set limit = 0 (treat negative as "show all") or
alternatively set limit = len(counts) to avoid negative slicing; use the
normalized limit when truncating entries and when composing the final summary
message (the "... showing top %d of %d" line) so it never prints a negative
number. Ensure you reference and update uses of limit in the slicing block, the
limit > 0 checks, and the final output message.
In `@pkg/toolsets/mustgather/monitoring.go`:
- Around line 192-197: The code currently calls mg.GetReplicaNumbers(replica)
then always uses only replicas[0], so passing "all" is ignored; update the logic
that uses replicas (the replicas variable and r usage) to handle multiple
entries: loop over the slice returned by mg.GetReplicaNumbers(replica) and
execute the monitoring/target logic for each replica, aggregating or merging
results appropriately (or, if intended, validate and reject "all" earlier and
adjust the tool description), ensuring references to params.GetArguments and
getString remain unchanged and any downstream variables that expected a single
replica are updated to accept per-replica processing.
In `@pkg/toolsets/mustgather/resources.go`:
- Around line 66-70: The current mustgatherResourcesGet uses the package-global
provider via getProvider(), which allows mustgather_use to change a global
`provider` and leak archives across sessions; instead, make the active
must-gather scoped to the session by retrieving the provider from the
request/session context in api.ToolHandlerParams (or adding a session-scoped
provider field) and update mustgatherResourcesGet to use that session-scoped
provider; also update mustgather_use to write the selected archive into the
session-scoped storage rather than the package-global `provider`, and propagate
this change to all mustgather_resources_* handlers so they read the provider
from the session (not getProvider/global `provider`).
---
Nitpick comments:
In `@pkg/ocp/mustgather/loader.go`:
- Around line 132-144: The walker callback in the filepath.Walk call is
currently returning on directories before checking for the "pods/..." prefix,
causing the walker to descend into all pod log subtrees; update the anonymous
callback used in filepath.Walk (the func(path string, info os.FileInfo, err
error) error) to compute rel := filepath.Rel(nsDir, path) early and, if
info.IsDir() && strings.HasPrefix(rel, "pods"+string(filepath.Separator)),
return filepath.SkipDir to prevent descending into pod log directories; keep the
existing file-suffix checks for .yaml/.yml for regular files.
In `@pkg/toolsets/mustgather/etcd.go`:
- Around line 173-175: Current sort of entries uses only count so ties are
non-deterministic; update the comparator passed to sort.Slice (the anonymous
func used with entries) to break ties by the resource field: return
entries[i].count > entries[j].count || (entries[i].count == entries[j].count &&
entries[i].resource < entries[j].resource). This ensures deterministic, stable
ordering for equal counts while keeping primary sort by count.
In `@pkg/toolsets/mustgather/events.go`:
- Around line 110-115: The sort block that uses sort.Slice(filtered, ...) and
compares ti and tj from unstructured.NestedString(filtered[i].Object,
"lastTimestamp") relies on RFC3339/ISO‑8601 lexical ordering; add a concise
comment above this sort explaining that Kubernetes timestamps are RFC3339 and
therefore string comparison yields correct chronological ordering, and
optionally note that if non‑RFC3339 input is expected the code should parse to
time.Time (time.Parse(time.RFC3339)) before comparing for robustness.
In `@pkg/toolsets/mustgather/helpers.go`:
- Around line 125-135: The healthSymbol function maps "firing" to "[OK]", which
is misleading; update the switch in healthSymbol so the "firing" string is
removed from the OK case and instead handled as a warning (map "firing" to
"[WARN]"), and update any related unit tests or callers that expect the old
behavior; locate the healthSymbol function and change the case membership
accordingly so "firing" falls through to the WARN branch (or add an explicit
case returning "[WARN]").
In `@pkg/toolsets/mustgather/monitoring.go`:
- Around line 395-398: The current insertion of summary using strings.Replace on
output is brittle; instead compute totalGroups/totalRules first and construct
the final output explicitly rather than mutating with Replace: build the header,
then append the summary (using the summary variable you already format), then
append the remainder of the formatted rules/body into output so you no longer
rely on finding "\n\n" in output (update the code paths that set summary and
output around the variables summary, totalGroups, totalRules and remove
strings.Replace usage).
- Around line 494-526: The two nearly identical functions formatNameValueSection
and formatNameValueBytesSection should be consolidated into a single function
that accepts a value formatter parameter; implement a new function (e.g.,
formatNameValueSectionWithFormatter(title string, items []mg.NameValue, limit
int, formatValue func(int64) string)) that performs the shared
header/empty-check/limit/truncate loop and calls formatValue(items[i].Value) for
formatting, then replace calls to formatNameValueSection and
formatNameValueBytesSection to use this new function with formatNumber and
formatBytes respectively and remove the duplicated functions.
In `@pkg/toolsets/mustgather/use.go`:
- Around line 56-73: The package currently registers a new must-gather resource
via params.ResourceRegistrar.AddResource without removing any previously
registered must-gather URI, causing stale entries; add logic in mustgather_use
to remove the prior registered resource before adding the new one by tracking
the last registered URI in package state (e.g., a package-level variable like
lastMustGatherURI) and call
params.ResourceRegistrar.RemoveResource(lastMustGatherURI) if non-empty, then
update lastMustGatherURI to the new resourceURI after successful AddResource; if
concurrent calls are possible, protect the package-level tracker with a mutex to
avoid races.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a666ab20-563f-4424-9df4-9b31e2b85c53
📒 Files selected for processing (24)
go.modinternal/tools/update-readme/main.gopkg/api/toolsets.gopkg/mcp/gosdk.gopkg/mcp/mcp.gopkg/mcp/modules.gopkg/mcp/testdata/toolsets-openshift-mustgather-tools.jsonpkg/mcp/toolsets_test.gopkg/ocp/mustgather/etcd.gopkg/ocp/mustgather/index.gopkg/ocp/mustgather/loader.gopkg/ocp/mustgather/monitoring.gopkg/ocp/mustgather/provider.gopkg/ocp/mustgather/types.gopkg/toolsets/mustgather/etcd.gopkg/toolsets/mustgather/events.gopkg/toolsets/mustgather/helpers.gopkg/toolsets/mustgather/monitoring.gopkg/toolsets/mustgather/namespaces.gopkg/toolsets/mustgather/nodes.gopkg/toolsets/mustgather/pod_logs.gopkg/toolsets/mustgather/resources.gopkg/toolsets/mustgather/toolset.gopkg/toolsets/mustgather/use.go
| func (p *Provider) ReadETCDFile(filename string) ([]byte, error) { | ||
| filePath := filepath.Join(p.metadata.ContainerDir, "etcd_info", filename) | ||
| data, err := os.ReadFile(filePath) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("ETCD file %s not found: %w", filename, err) | ||
| } |
There was a problem hiding this comment.
Harden ReadETCDFile against path traversal.
At Line 79, filename is joined directly into the archive path. If a non-constant/untrusted filename reaches this method later, ../ traversal could read files outside etcd_info.
🔒 Proposed fix
import (
"encoding/json"
"fmt"
"os"
"path/filepath"
+ "strings"
)
@@
func (p *Provider) ReadETCDFile(filename string) ([]byte, error) {
- filePath := filepath.Join(p.metadata.ContainerDir, "etcd_info", filename)
+ clean := filepath.Clean(filename)
+ if filepath.IsAbs(clean) || clean == ".." || strings.HasPrefix(clean, ".."+string(filepath.Separator)) {
+ return nil, fmt.Errorf("invalid ETCD filename: %q", filename)
+ }
+
+ baseDir := filepath.Join(p.metadata.ContainerDir, "etcd_info")
+ filePath := filepath.Join(baseDir, clean)
+ rel, err := filepath.Rel(baseDir, filePath)
+ if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(filepath.Separator)) {
+ return nil, fmt.Errorf("invalid ETCD filename: %q", filename)
+ }
+
data, err := os.ReadFile(filePath)
if err != nil {
return nil, fmt.Errorf("ETCD file %s not found: %w", filename, err)
}
return data, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (p *Provider) ReadETCDFile(filename string) ([]byte, error) { | |
| filePath := filepath.Join(p.metadata.ContainerDir, "etcd_info", filename) | |
| data, err := os.ReadFile(filePath) | |
| if err != nil { | |
| return nil, fmt.Errorf("ETCD file %s not found: %w", filename, err) | |
| } | |
| func (p *Provider) ReadETCDFile(filename string) ([]byte, error) { | |
| clean := filepath.Clean(filename) | |
| if filepath.IsAbs(clean) || clean == ".." || strings.HasPrefix(clean, ".."+string(filepath.Separator)) { | |
| return nil, fmt.Errorf("invalid ETCD filename: %q", filename) | |
| } | |
| baseDir := filepath.Join(p.metadata.ContainerDir, "etcd_info") | |
| filePath := filepath.Join(baseDir, clean) | |
| rel, err := filepath.Rel(baseDir, filePath) | |
| if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(filepath.Separator)) { | |
| return nil, fmt.Errorf("invalid ETCD filename: %q", filename) | |
| } | |
| data, err := os.ReadFile(filePath) | |
| if err != nil { | |
| return nil, fmt.Errorf("ETCD file %s not found: %w", filename, err) | |
| } | |
| return data, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/ocp/mustgather/etcd.go` around lines 78 - 83, ReadETCDFile currently
joins filename into p.metadata.ContainerDir + "etcd_info" directly which allows
path traversal; validate and sanitize filename before reading: reject absolute
paths (filepath.IsAbs), reject any filename that after filepath.Clean contains
".." path segments or path separators, or alternatively construct the absolute
path and verify it is inside the etcd_info base using filepath.Rel to ensure the
resulting filePath is within the base (p.metadata.ContainerDir + "etcd_info")
before calling os.ReadFile; update ReadETCDFile to perform these checks and
return an error for invalid traversal attempts.
| func (idx *ResourceIndex) List(_ context.Context, gvk schema.GroupVersionKind, namespace string, opts ListOptions) *unstructured.UnstructuredList { | ||
| result := &unstructured.UnstructuredList{} | ||
|
|
||
| if namespace != "" { | ||
| // Namespace-scoped query | ||
| nsMap, ok := idx.byNamespace[namespace] | ||
| if !ok { | ||
| return result | ||
| } | ||
| gvkMap, ok := nsMap[gvk] | ||
| if !ok { | ||
| return result | ||
| } | ||
| for _, obj := range gvkMap { | ||
| if matchesListOptions(obj, opts) { | ||
| result.Items = append(result.Items, *obj.DeepCopy()) | ||
| } | ||
| } | ||
| } else { | ||
| // Cluster-wide query | ||
| gvkMap, ok := idx.byGVK[gvk] | ||
| if !ok { | ||
| return result | ||
| } | ||
| for _, obj := range gvkMap { | ||
| if matchesListOptions(obj, opts) { | ||
| result.Items = append(result.Items, *obj.DeepCopy()) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Apply limit | ||
| if opts.Limit > 0 && len(result.Items) > opts.Limit { | ||
| result.Items = result.Items[:opts.Limit] | ||
| } |
There was a problem hiding this comment.
Sort before applying limit.
mustgather_resources_list uses this path for user-visible results, but there is no stable ordering before opts.Limit is applied. Two identical calls can therefore return different subsets, which makes archive analysis hard to reproduce when users work with a small limit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/ocp/mustgather/index.go` around lines 105 - 139, The List method in
ResourceIndex builds result.Items and truncates by opts.Limit without a
deterministic ordering, causing non-reproducible subsets; after populating
result.Items (but before applying the Limit) sort it deterministically (e.g.
stable sort by GetNamespace(), then GetName(), then GetUID() or creation
timestamp) so repeated calls return the same first N items; update
ResourceIndex.List to perform this stable sort on result.Items prior to the
existing limit logic referenced by symbols ResourceIndex.List, result.Items,
matchesListOptions, and opts.Limit.
| func matchesLabelSelector(labels map[string]string, selector string) bool { | ||
| parts := strings.Split(selector, ",") | ||
| for _, part := range parts { | ||
| part = strings.TrimSpace(part) | ||
| if part == "" { | ||
| continue | ||
| } | ||
| kv := strings.SplitN(part, "=", 2) | ||
| if len(kv) != 2 { | ||
| continue | ||
| } | ||
| key := strings.TrimSpace(kv[0]) | ||
| value := strings.TrimSpace(kv[1]) | ||
| if labels[key] != value { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } | ||
|
|
||
| func matchesFieldSelector(obj *unstructured.Unstructured, selector string) bool { | ||
| parts := strings.Split(selector, ",") | ||
| for _, part := range parts { | ||
| part = strings.TrimSpace(part) | ||
| if part == "" { | ||
| continue | ||
| } | ||
| kv := strings.SplitN(part, "=", 2) | ||
| if len(kv) != 2 { | ||
| continue | ||
| } | ||
| key := strings.TrimSpace(kv[0]) | ||
| value := strings.TrimSpace(kv[1]) | ||
|
|
||
| switch key { | ||
| case "metadata.name": | ||
| if obj.GetName() != value { | ||
| return false | ||
| } | ||
| case "metadata.namespace": | ||
| if obj.GetNamespace() != value { | ||
| return false | ||
| } | ||
| } | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
Reject malformed selectors instead of matching everything.
pkg/toolsets/mustgather/resources.go (Lines 106-118) forwards raw selector strings here. Right now malformed label clauses and unsupported field names are skipped with continue, so typos like app or spec.nodeName=... can silently broaden the result set instead of failing fast.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/ocp/mustgather/index.go` around lines 171 - 217, matchesLabelSelector and
matchesFieldSelector currently ignore malformed clauses and unsupported fields
which widens matches; change them to reject invalid selectors by returning false
on any malformed part (e.g., kv len != 2 or empty key/value) and in
matchesFieldSelector treat unknown keys (not metadata.name/metadata.namespace)
as invalid and return false instead of continuing; update both functions
(matchesLabelSelector and matchesFieldSelector) to validate each clause strictly
and fail-fast when a clause is malformed or unsupported so callers receive
no-match for bad selectors.
| func Load(path string) (*LoadResult, error) { | ||
| containerDir, err := FindContainerDir(path) | ||
| if err != nil { | ||
| // If no container dir found, use the path directly | ||
| containerDir = path | ||
| } | ||
|
|
||
| metadata := MustGatherMetadata{ | ||
| Path: path, | ||
| ContainerDir: containerDir, | ||
| } | ||
|
|
||
| // Load metadata files | ||
| loadMetadata(containerDir, &metadata) | ||
|
|
||
| // Load all resources | ||
| var resources []*unstructured.Unstructured | ||
|
|
||
| clusterResources, err := loadClusterScopedResources(containerDir) | ||
| if err == nil { | ||
| resources = append(resources, clusterResources...) | ||
| } | ||
|
|
||
| namespacedResources, err := loadNamespacedResources(containerDir) | ||
| if err == nil { | ||
| resources = append(resources, namespacedResources...) | ||
| } |
There was a problem hiding this comment.
Don't treat unreadable archives as a successful load.
Load falls back to containerDir = path for any FindContainerDir error, and then drops the resource-loader errors on Lines 39-47. A nonexistent path, file path, or permission failure can therefore "load" with zero resources and only fail later as false not found results. Please only ignore the "container directory not found" case and return other I/O errors immediately.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/ocp/mustgather/loader.go` around lines 21 - 47, Load currently masks IO
errors: in Load, do not treat all FindContainerDir errors as ignorable—only
tolerate the specific "container directory not found" case; for any other error
returned by FindContainerDir return it immediately. Also stop swallowing errors
from loadClusterScopedResources and loadNamespacedResources: if either returns a
non-nil error, propagate that error from Load (do not append nil result or
continue). Update the logic in Load (referencing FindContainerDir, Load,
loadClusterScopedResources, loadNamespacedResources, and loadMetadata) so only
the explicit "container dir not found" condition is ignored and all other errors
are returned.
| func readGzipFile(path string) (string, error) { | ||
| f, err := os.Open(path) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| defer func() { _ = f.Close() }() | ||
|
|
||
| reader, err := gzip.NewReader(f) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| defer func() { _ = reader.Close() }() | ||
|
|
||
| var sb strings.Builder | ||
| buf := make([]byte, 32*1024) | ||
| for { | ||
| n, readErr := reader.Read(buf) | ||
| if n > 0 { | ||
| sb.Write(buf[:n]) | ||
| } | ||
| if readErr != nil { | ||
| break | ||
| } | ||
| } | ||
|
|
||
| return sb.String(), nil |
There was a problem hiding this comment.
Return decompression failures to the caller.
Any non-nil readErr just exits the loop, and the function then returns sb.String(), nil. A truncated or corrupt kubelet log will therefore look like a successful read with missing tail data, which is very hard to spot during troubleshooting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/ocp/mustgather/provider.go` around lines 176 - 201, The readGzipFile
function currently swallows read errors and always returns nil error; update
read loop in readGzipFile to treat io.EOF as normal (break) but return any other
non-nil readErr to the caller (after writing any bytes read), so
decompression/truncation failures propagate; locate the function readGzipFile
and modify the loop to check readErr (if readErr == io.EOF break else return "",
readErr).
| func TailLines(content string, n int) string { | ||
| lines := strings.Split(content, "\n") | ||
| if len(lines) <= n { | ||
| return content | ||
| } | ||
| return strings.Join(lines[len(lines)-n:], "\n") |
There was a problem hiding this comment.
Trim trailing newlines before tailing.
Newline-terminated logs currently grow an empty final element, so tail=1 on a\nb\n returns the blank trailer instead of b. That affects both pod log and kubelet log tailing.
Suggested fix
func TailLines(content string, n int) string {
- lines := strings.Split(content, "\n")
+ hadTrailingNewline := strings.HasSuffix(content, "\n")
+ trimmed := strings.TrimRight(content, "\n")
+ lines := strings.Split(trimmed, "\n")
if len(lines) <= n {
return content
}
- return strings.Join(lines[len(lines)-n:], "\n")
+ tail := strings.Join(lines[len(lines)-n:], "\n")
+ if hadTrailingNewline {
+ tail += "\n"
+ }
+ return tail
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/ocp/mustgather/provider.go` around lines 205 - 210, TailLines currently
returns an empty string when content ends with a newline because Split produces
a trailing empty element; update TailLines to trim trailing newline characters
from content (e.g., use strings.TrimRight(content, "\n")) before splitting or
otherwise ignore the final empty element so that TailLines("a\nb\n", 1) returns
"b"; apply the change inside the TailLines function to preserve existing
behavior for non-newline-terminated input.
| limit := getInt(params.GetArguments(), "limit", 50) | ||
|
|
||
| counts, err := p.GetETCDObjectCount() | ||
| if err != nil { | ||
| return api.NewToolCallResult("", fmt.Errorf("failed to get ETCD object counts: %w", err)), nil | ||
| } | ||
|
|
||
| // Sort by count descending | ||
| type entry struct { | ||
| resource string | ||
| count int64 | ||
| } | ||
| entries := make([]entry, 0, len(counts)) | ||
| var total int64 | ||
| for k, v := range counts { | ||
| entries = append(entries, entry{resource: k, count: v}) | ||
| total += v | ||
| } | ||
| sort.Slice(entries, func(i, j int) bool { | ||
| return entries[i].count > entries[j].count | ||
| }) | ||
|
|
||
| if limit > 0 && len(entries) > limit { | ||
| entries = entries[:limit] | ||
| } | ||
|
|
||
| output := fmt.Sprintf("## ETCD Object Counts\n\nTotal objects: %s across %d resource types\n\n", formatNumber(total), len(counts)) | ||
|
|
||
| // Find max resource name length for alignment | ||
| maxLen := 0 | ||
| for _, e := range entries { | ||
| if len(e.resource) > maxLen { | ||
| maxLen = len(e.resource) | ||
| } | ||
| } | ||
|
|
||
| for _, e := range entries { | ||
| output += fmt.Sprintf("%-*s %s\n", maxLen, e.resource, formatNumber(e.count)) | ||
| } | ||
|
|
||
| if limit > 0 && len(counts) > limit { | ||
| output += fmt.Sprintf("\n... showing top %d of %d resource types\n", limit, len(counts)) | ||
| } |
There was a problem hiding this comment.
Validate limit to avoid invalid output states.
At Line 155, non-positive values are accepted; e.g., negative input can produce misleading text at Line 196 (top -1).
🛠 Proposed fix
limit := getInt(params.GetArguments(), "limit", 50)
+ if limit <= 0 {
+ return api.NewToolCallResult("", fmt.Errorf("limit must be greater than 0")), nil
+ }
@@
- if limit > 0 && len(entries) > limit {
+ if len(entries) > limit {
entries = entries[:limit]
}
@@
- if limit > 0 && len(counts) > limit {
+ if len(counts) > limit {
output += fmt.Sprintf("\n... showing top %d of %d resource types\n", limit, len(counts))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| limit := getInt(params.GetArguments(), "limit", 50) | |
| counts, err := p.GetETCDObjectCount() | |
| if err != nil { | |
| return api.NewToolCallResult("", fmt.Errorf("failed to get ETCD object counts: %w", err)), nil | |
| } | |
| // Sort by count descending | |
| type entry struct { | |
| resource string | |
| count int64 | |
| } | |
| entries := make([]entry, 0, len(counts)) | |
| var total int64 | |
| for k, v := range counts { | |
| entries = append(entries, entry{resource: k, count: v}) | |
| total += v | |
| } | |
| sort.Slice(entries, func(i, j int) bool { | |
| return entries[i].count > entries[j].count | |
| }) | |
| if limit > 0 && len(entries) > limit { | |
| entries = entries[:limit] | |
| } | |
| output := fmt.Sprintf("## ETCD Object Counts\n\nTotal objects: %s across %d resource types\n\n", formatNumber(total), len(counts)) | |
| // Find max resource name length for alignment | |
| maxLen := 0 | |
| for _, e := range entries { | |
| if len(e.resource) > maxLen { | |
| maxLen = len(e.resource) | |
| } | |
| } | |
| for _, e := range entries { | |
| output += fmt.Sprintf("%-*s %s\n", maxLen, e.resource, formatNumber(e.count)) | |
| } | |
| if limit > 0 && len(counts) > limit { | |
| output += fmt.Sprintf("\n... showing top %d of %d resource types\n", limit, len(counts)) | |
| } | |
| limit := getInt(params.GetArguments(), "limit", 50) | |
| if limit <= 0 { | |
| return api.NewToolCallResult("", fmt.Errorf("limit must be greater than 0")), nil | |
| } | |
| counts, err := p.GetETCDObjectCount() | |
| if err != nil { | |
| return api.NewToolCallResult("", fmt.Errorf("failed to get ETCD object counts: %w", err)), nil | |
| } | |
| // Sort by count descending | |
| type entry struct { | |
| resource string | |
| count int64 | |
| } | |
| entries := make([]entry, 0, len(counts)) | |
| var total int64 | |
| for k, v := range counts { | |
| entries = append(entries, entry{resource: k, count: v}) | |
| total += v | |
| } | |
| sort.Slice(entries, func(i, j int) bool { | |
| return entries[i].count > entries[j].count | |
| }) | |
| if len(entries) > limit { | |
| entries = entries[:limit] | |
| } | |
| output := fmt.Sprintf("## ETCD Object Counts\n\nTotal objects: %s across %d resource types\n\n", formatNumber(total), len(counts)) | |
| // Find max resource name length for alignment | |
| maxLen := 0 | |
| for _, e := range entries { | |
| if len(e.resource) > maxLen { | |
| maxLen = len(e.resource) | |
| } | |
| } | |
| for _, e := range entries { | |
| output += fmt.Sprintf("%-*s %s\n", maxLen, e.resource, formatNumber(e.count)) | |
| } | |
| if len(counts) > limit { | |
| output += fmt.Sprintf("\n... showing top %d of %d resource types\n", limit, len(counts)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/etcd.go` around lines 155 - 197, The code accepts
negative limit values from getInt which can lead to misleading output (e.g.,
"top -1"); clamp and normalize limit immediately after it's read: capture the
raw value from getInt into a variable (limit), then if limit < 0 set limit = 0
(treat negative as "show all") or alternatively set limit = len(counts) to avoid
negative slicing; use the normalized limit when truncating entries and when
composing the final summary message (the "... showing top %d of %d" line) so it
never prints a negative number. Ensure you reference and update uses of limit in
the slicing block, the limit > 0 checks, and the final output message.
| args := params.GetArguments() | ||
| replica := getString(args, "replica", "0") | ||
| healthFilter := getString(args, "health", "") | ||
|
|
||
| replicas := mg.GetReplicaNumbers(replica) | ||
| r := replicas[0] |
There was a problem hiding this comment.
Replica handling ignores "all" option for targets tool.
The tool description says replica can be "0, 1, or all" with default "0", but the implementation always uses only replicas[0] regardless of input. If "all" is passed, it will return [0, 1] but only the first element is used.
🐛 Proposed fix to honor "all" or clarify description
Option 1: Update description to clarify only one replica is used:
- "replica": {Type: "string", Description: "Prometheus replica (0, 1, or all). Default: 0"},
+ "replica": {Type: "string", Description: "Prometheus replica (0 or 1). Default: 0"},Option 2: Support multiple replicas (more complex):
replicas := mg.GetReplicaNumbers(replica)
- r := replicas[0]
- targets, err := p.GetPrometheusActiveTargets(r)
+ output := ""
+ for _, r := range replicas {
+ targets, err := p.GetPrometheusActiveTargets(r)
+ // ... handle each replica
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/monitoring.go` around lines 192 - 197, The code
currently calls mg.GetReplicaNumbers(replica) then always uses only replicas[0],
so passing "all" is ignored; update the logic that uses replicas (the replicas
variable and r usage) to handle multiple entries: loop over the slice returned
by mg.GetReplicaNumbers(replica) and execute the monitoring/target logic for
each replica, aggregating or merging results appropriately (or, if intended,
validate and reject "all" earlier and adjust the tool description), ensuring
references to params.GetArguments and getString remain unchanged and any
downstream variables that expected a single replica are updated to accept
per-replica processing.
| func mustgatherResourcesGet(params api.ToolHandlerParams) (*api.ToolCallResult, error) { | ||
| p, err := getProvider() | ||
| if err != nil { | ||
| return api.NewToolCallResult("", err), nil | ||
| } |
There was a problem hiding this comment.
Scope the active must-gather per session, not process-wide.
The lookup here is backed by the package-global provider in pkg/toolsets/mustgather/helpers.go (Lines 11-31). Once another client calls mustgather_use, subsequent mustgather_resources_* calls can start reading that other archive, which breaks isolation and can leak customer data between concurrent sessions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/resources.go` around lines 66 - 70, The current
mustgatherResourcesGet uses the package-global provider via getProvider(), which
allows mustgather_use to change a global `provider` and leak archives across
sessions; instead, make the active must-gather scoped to the session by
retrieving the provider from the request/session context in
api.ToolHandlerParams (or adding a session-scoped provider field) and update
mustgatherResourcesGet to use that session-scoped provider; also update
mustgather_use to write the selected archive into the session-scoped storage
rather than the package-global `provider`, and propagate this change to all
mustgather_resources_* handlers so they read the provider from the session (not
getProvider/global `provider`).
|
@swghosh: This pull request references MG-216 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/toolsets/mustgather/events.go`:
- Around line 36-37: The limit parameter across mustgather event endpoints is
currently uncapped/treated as unlimited; update the validation and defaulting so
every "limit" schema and parser enforces a sane default and an upper cap (e.g.,
default=100, max=1000) and treats non-positive values as the default;
specifically modify the limit schema entries and the parsing logic used by
mustgather_events_by_resource and other event handlers in
pkg/toolsets/mustgather/events.go so they clamp incoming limit to the range [1,
MAX_LIMIT] (use a named constant MAX_LIMIT) and consistently apply the default
when absent or <=0.
- Around line 134-139: The sort/filter is currently comparing raw timestamp
strings from unstructured.NestedString for "lastTimestamp", which misorders
events and drops ones missing or with unparsable strings; update the sort and
any filter logic (the sort.Slice call operating on variable filtered and other
similar blocks) to extract "lastTimestamp", "eventTime", and "firstTimestamp"
via unstructured.NestedString, parse them with time.Parse(time.RFC3339) (or try
multiple Kubernetes timestamp layouts), fall back to zero time when parsing
fails, and sort by the resulting time.Time values (newest first); apply the same
parsing-and-fallback approach to the other occurrences noted (the blocks around
the other sort/filter usages at the referenced locations) so all comparisons use
parsed times rather than raw strings.
- Around line 233-240: The code parses untilStr into untilTime but doesn't
reject inverted ranges; add a validation after parsing that when hasUntil is
true and sinceStr is non-empty (or sinceTime was parsed) you compare untilTime
and sinceTime and return an api.NewToolCallResult error if untilTime is before
sinceTime (e.g., "invalid time range: 'until' must not be before 'since'"),
updating the same error-return path used in this file; reference the untilTime,
hasUntil, untilStr and sinceStr/sinceTime variables to locate where to add this
check.
In `@pkg/toolsets/mustgather/pod_logs.go`:
- Around line 141-151: The current code calls p.GetPodLog(...) and then does
strings.Split(logs, "\n"), which materializes the entire container log; change
the implementation to a streaming/reader-based path (e.g., add or use a
GetPodLogStream/ReadCloser variant on the provider that accepts mg.PodLogOptions
and returns an io.ReadCloser or stream) and replace the strings.Split usage with
a bufio.Scanner (or equivalent) to scan lines incrementally; while scanning,
apply the grep/filter logic and keep only the needed tail matches using a
fixed-size ring buffer or deque to avoid buffering the whole log, ensure you
close the stream, and propagate errors the same way as before (references:
GetPodLog, mg.PodLogOptions, and the function performing the current
split/grep).
- Around line 77-94: The code currently accepts any integer for tail (assigned
via tail := getInt(args, "tail", 0)) and forwards negative values to the
provider; add explicit validation after retrieving tail to reject negative
values: if tail < 0 return api.NewToolCallResult("", fmt.Errorf("tail must be
non-negative")), nil. Apply the same check in the other handler that reads tail
(the similar block around where tail is assigned for the grep/mustgather grep
path) so both mustgatherPodLogsGet/code paths consistently return a validation
error for negative tail values rather than forwarding or treating them as
unlimited.
- Around line 168-185: Compute and store the total number of matches before you
truncate matchingLines for tailing, e.g. capture totalMatches :=
len(matchingLines) immediately before the block that slices matchingLines when
tail > 0, then use totalMatches in the header’s "Found %d matching line(s)"
message while still slicing matchingLines for display; update the code around
the matchingLines/tail logic and the header construction (symbols:
matchingLines, tail, totalMatches, header, namespace, pod, filter, container,
previous, caseInsensitive) so the summary shows the full match count but the
output only contains the tailed subset.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cabd9a7d-7a1f-4b77-bd60-5d401b3e8d88
📒 Files selected for processing (3)
pkg/mcp/testdata/toolsets-openshift-mustgather-tools.jsonpkg/toolsets/mustgather/events.gopkg/toolsets/mustgather/pod_logs.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/mcp/testdata/toolsets-openshift-mustgather-tools.json
| "limit": {Type: "integer", Description: "Maximum number of events to return (default: 100)"}, | ||
| }, |
There was a problem hiding this comment.
Normalize and cap result limits across all event tools.
limit is currently uncapped (and effectively unlimited for non-positive values), and mustgather_events_by_resource has no limit at all. Large archives can produce oversized responses.
Proposed fix
+const (
+ defaultEventsListLimit = 100
+ defaultEventsByTimeLimit = 200
+ defaultEventsByResLimit = 200
+ maxEventsLimit = 1000
+)
+
+func normalizeLimit(v, def int) int {
+ if v <= 0 {
+ return def
+ }
+ if v > maxEventsLimit {
+ return maxEventsLimit
+ }
+ return v
+}
+
// schema: mustgather_events_by_resource
Properties: map[string]*jsonschema.Schema{
"name": {Type: "string", Description: "Resource name"},
"namespace": {Type: "string", Description: "Resource namespace"},
"kind": {Type: "string", Description: "Resource kind (optional, narrows search)"},
+ "limit": {Type: "integer", Description: "Maximum number of events to return (default: 200, max: 1000)"},
},
// mustgatherEventsList
- limit := getInt(args, "limit", 100)
+ limit := normalizeLimit(getInt(args, "limit", defaultEventsListLimit), defaultEventsListLimit)
// mustgatherEventsByResource
+ limit := normalizeLimit(getInt(args, "limit", defaultEventsByResLimit), defaultEventsByResLimit)
+ if len(matched) > limit {
+ matched = matched[:limit]
+ }
// mustgatherEventsByTime
- limit := getInt(args, "limit", 200)
+ limit := normalizeLimit(getInt(args, "limit", defaultEventsByTimeLimit), defaultEventsByTimeLimit)Also applies to: 53-57, 79-80, 101-104, 142-144, 203-206, 222-223, 282-284
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/events.go` around lines 36 - 37, The limit parameter
across mustgather event endpoints is currently uncapped/treated as unlimited;
update the validation and defaulting so every "limit" schema and parser enforces
a sane default and an upper cap (e.g., default=100, max=1000) and treats
non-positive values as the default; specifically modify the limit schema entries
and the parsing logic used by mustgather_events_by_resource and other event
handlers in pkg/toolsets/mustgather/events.go so they clamp incoming limit to
the range [1, MAX_LIMIT] (use a named constant MAX_LIMIT) and consistently apply
the default when absent or <=0.
| // Sort by lastTimestamp descending | ||
| sort.Slice(filtered, func(i, j int) bool { | ||
| ti, _, _ := unstructured.NestedString(filtered[i].Object, "lastTimestamp") | ||
| tj, _, _ := unstructured.NestedString(filtered[j].Object, "lastTimestamp") | ||
| return ti > tj | ||
| }) |
There was a problem hiding this comment.
Sort/filter events using parsed timestamps, not raw strings.
Current logic compares timestamp strings and only uses lastTimestamp. That can misorder results and silently exclude valid events when lastTimestamp is absent or unparseable.
Proposed fix
+func parseEventTime(event *unstructured.Unstructured) (time.Time, bool) {
+ for _, path := range [][]string{
+ {"lastTimestamp"},
+ {"eventTime"},
+ {"metadata", "creationTimestamp"},
+ } {
+ ts, found, _ := unstructured.NestedString(event.Object, path...)
+ if !found || ts == "" {
+ continue
+ }
+ if t, err := time.Parse(time.RFC3339, ts); err == nil {
+ return t, true
+ }
+ }
+ return time.Time{}, false
+}
+
// Example in mustgatherEventsByTime loop
- lastTimestamp, _, _ := unstructured.NestedString(event.Object, "lastTimestamp")
- if lastTimestamp == "" {
- continue
- }
- eventTime, err := time.Parse(time.RFC3339, lastTimestamp)
- if err != nil {
+ eventTime, ok := parseEventTime(event)
+ if !ok {
continue
}
// Example sort (descending)
- return ti > tj
+ ti, oki := parseEventTime(&filtered[i])
+ tj, okj := parseEventTime(&filtered[j])
+ if !oki {
+ return false
+ }
+ if !okj {
+ return true
+ }
+ return ti.After(tj)Also applies to: 193-197, 248-256, 275-280
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/events.go` around lines 134 - 139, The sort/filter is
currently comparing raw timestamp strings from unstructured.NestedString for
"lastTimestamp", which misorders events and drops ones missing or with
unparsable strings; update the sort and any filter logic (the sort.Slice call
operating on variable filtered and other similar blocks) to extract
"lastTimestamp", "eventTime", and "firstTimestamp" via
unstructured.NestedString, parse them with time.Parse(time.RFC3339) (or try
multiple Kubernetes timestamp layouts), fall back to zero time when parsing
fails, and sort by the resulting time.Time values (newest first); apply the same
parsing-and-fallback approach to the other occurrences noted (the blocks around
the other sort/filter usages at the referenced locations) so all comparisons use
parsed times rather than raw strings.
| var untilTime time.Time | ||
| hasUntil := untilStr != "" | ||
| if hasUntil { | ||
| untilTime, err = time.Parse(time.RFC3339, untilStr) | ||
| if err != nil { | ||
| return api.NewToolCallResult("", fmt.Errorf("invalid until time format, expected RFC3339 (e.g. 2026-01-15T12:00:00Z): %w", err)), nil | ||
| } | ||
| } |
There was a problem hiding this comment.
Reject inverted time ranges (until before since).
Without this check, invalid windows return misleading “No events found”.
Proposed fix
if hasUntil {
untilTime, err = time.Parse(time.RFC3339, untilStr)
if err != nil {
return api.NewToolCallResult("", fmt.Errorf("invalid until time format, expected RFC3339 (e.g. 2026-01-15T12:00:00Z): %w", err)), nil
}
+ if untilTime.Before(sinceTime) {
+ return api.NewToolCallResult("", fmt.Errorf("until must be greater than or equal to since")), nil
+ }
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/events.go` around lines 233 - 240, The code parses
untilStr into untilTime but doesn't reject inverted ranges; add a validation
after parsing that when hasUntil is true and sinceStr is non-empty (or sinceTime
was parsed) you compare untilTime and sinceTime and return an
api.NewToolCallResult error if untilTime is before sinceTime (e.g., "invalid
time range: 'until' must not be before 'since'"), updating the same error-return
path used in this file; reference the untilTime, hasUntil, untilStr and
sinceStr/sinceTime variables to locate where to add this check.
| tail := getInt(args, "tail", 0) | ||
|
|
||
| if namespace == "" || pod == "" { | ||
| return api.NewToolCallResult("", fmt.Errorf("namespace and pod are required")), nil | ||
| } | ||
|
|
||
| logType := mg.LogTypeCurrent | ||
| if previous { | ||
| logType = mg.LogTypePrevious | ||
| } | ||
|
|
||
| logs, err := p.GetPodLog(mg.PodLogOptions{ | ||
| Namespace: namespace, | ||
| Pod: pod, | ||
| Container: container, | ||
| LogType: logType, | ||
| TailLines: tail, | ||
| }) |
There was a problem hiding this comment.
Reject negative tail values explicitly.
Line 77 and Line 126 accept any integer, but the same invalid input currently behaves differently: mustgatherPodLogsGet forwards it to the provider, while grep silently treats it as unlimited. Returning a validation error here will make both tools deterministic.
🛠️ Suggested fix
tail := getInt(args, "tail", 0)
+ if tail < 0 {
+ return api.NewToolCallResult("", fmt.Errorf("tail must be >= 0")), nil
+ }
if namespace == "" || pod == "" {
return api.NewToolCallResult("", fmt.Errorf("namespace and pod are required")), nil
} tail := getInt(args, "tail", 0)
caseInsensitive := getBool(args, "caseInsensitive", false)
+ if tail < 0 {
+ return api.NewToolCallResult("", fmt.Errorf("tail must be >= 0")), nil
+ }
if namespace == "" || pod == "" {
return api.NewToolCallResult("", fmt.Errorf("namespace and pod are required")), nil
}Also applies to: 126-133
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/pod_logs.go` around lines 77 - 94, The code currently
accepts any integer for tail (assigned via tail := getInt(args, "tail", 0)) and
forwards negative values to the provider; add explicit validation after
retrieving tail to reject negative values: if tail < 0 return
api.NewToolCallResult("", fmt.Errorf("tail must be non-negative")), nil. Apply
the same check in the other handler that reads tail (the similar block around
where tail is assigned for the grep/mustgather grep path) so both
mustgatherPodLogsGet/code paths consistently return a validation error for
negative tail values rather than forwarding or treating them as unlimited.
| logs, err := p.GetPodLog(mg.PodLogOptions{ | ||
| Namespace: namespace, | ||
| Pod: pod, | ||
| Container: container, | ||
| LogType: logType, | ||
| }) | ||
| if err != nil { | ||
| return api.NewToolCallResult("", fmt.Errorf("failed to get pod logs: %w", err)), nil | ||
| } | ||
|
|
||
| lines := strings.Split(logs, "\n") |
There was a problem hiding this comment.
Avoid full-log materialization in the grep path.
Even when the caller only wants the last few matches, Line 141 loads the entire container log and Line 151 splits the whole string before filtering. On large must-gather archives this makes mustgather_pod_logs_grep scale with total log size in both memory and latency, which is risky for the offline-debugging use case this toolset is targeting. A streaming/provider-side grep path would be much safer here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/pod_logs.go` around lines 141 - 151, The current code
calls p.GetPodLog(...) and then does strings.Split(logs, "\n"), which
materializes the entire container log; change the implementation to a
streaming/reader-based path (e.g., add or use a GetPodLogStream/ReadCloser
variant on the provider that accepts mg.PodLogOptions and returns an
io.ReadCloser or stream) and replace the strings.Split usage with a
bufio.Scanner (or equivalent) to scan lines incrementally; while scanning, apply
the grep/filter logic and keep only the needed tail matches using a fixed-size
ring buffer or deque to avoid buffering the whole log, ensure you close the
stream, and propagate errors the same way as before (references: GetPodLog,
mg.PodLogOptions, and the function performing the current split/grep).
| if tail > 0 && len(matchingLines) > tail { | ||
| matchingLines = matchingLines[len(matchingLines)-tail:] | ||
| } | ||
|
|
||
| header := fmt.Sprintf("Logs for pod %s/%s filtered by '%s'", namespace, pod, filter) | ||
| if container != "" { | ||
| header += fmt.Sprintf(", container %s", container) | ||
| } | ||
| if previous { | ||
| header += " (previous)" | ||
| } | ||
| if caseInsensitive { | ||
| header += " (case-insensitive)" | ||
| } | ||
| if tail > 0 { | ||
| header += fmt.Sprintf(" (last %d matches)", tail) | ||
| } | ||
| header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines)) |
There was a problem hiding this comment.
Keep the total match count separate from the tailed result.
Line 185 reports len(matchingLines) after Lines 168-170 may already have dropped earlier hits. With tail=10, 100 matches will be reported as Found 10 matching line(s), which makes the summary inaccurate.
🛠️ Suggested fix
- var matchingLines []string
+ var matchingLines []string
+ totalMatches := 0
for _, line := range lines {
compareLine := line
if caseInsensitive {
compareLine = strings.ToLower(line)
}
if strings.Contains(compareLine, searchFilter) {
+ totalMatches++
matchingLines = append(matchingLines, line)
}
}
@@
- header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines))
+ header += fmt.Sprintf(":\n\nFound %d matching line(s)", totalMatches)
+ if tail > 0 && totalMatches > len(matchingLines) {
+ header += fmt.Sprintf(", returning last %d", len(matchingLines))
+ }
+ header += "\n\n"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/pod_logs.go` around lines 168 - 185, Compute and
store the total number of matches before you truncate matchingLines for tailing,
e.g. capture totalMatches := len(matchingLines) immediately before the block
that slices matchingLines when tail > 0, then use totalMatches in the header’s
"Found %d matching line(s)" message while still slicing matchingLines for
display; update the code around the matchingLines/tail logic and the header
construction (symbols: matchingLines, tail, totalMatches, header, namespace,
pod, filter, container, previous, caseInsensitive) so the summary shows the full
match count but the output only contains the tailed subset.
|
@swghosh: This pull request references MG-216 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/toolsets/mustgather/pod_logs.go (1)
81-81: Clarify thatlimitreturns the first N matching lines.The current description "Maximum number of lines to return" doesn't indicate whether it's the first or last N. This differs from
tailin the grep tool which returns the last N matches. Consider updating for clarity.📝 Suggested description update
- "limit": {Type: "integer", Description: "Maximum number of lines to return (default: 500)"}, + "limit": {Type: "integer", Description: "Maximum number of lines to return from start of range (default: 500)"},🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/pod_logs.go` at line 81, Update the JSON schema description for the "limit" parameter to explicitly state it returns the first N matching lines (not the last), so callers understand it behaves differently from grep's tail behavior; locate the "limit" schema entry in pkg/toolsets/mustgather/pod_logs.go (the `"limit": {Type: "integer", Description: ...}` entry) and change the Description to something like "Maximum number of matching lines to return (first N matches, default: 500)" to clearly indicate it returns the first matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/toolsets/mustgather/pod_logs.go`:
- Around line 273-291: The loop in PodLogs that currently silently drops lines
without parseable RFC3339Nano timestamps should count them and surface that
metric: add a skippedLines int counter and increment it whenever you continue
due to spaceIdx == -1 or time.Parse error (the variables in this block are
lines, spaceIdx, lineTime, sinceTime, hasUntil, untilTime). After processing,
include skippedLines in the log header (only when >0) so callers know how many
lines were omitted; update the header-building code that emits the pod log
metadata to append a "(skipped X non-timestamped lines)" note when skippedLines
> 0. Ensure behavior otherwise unchanged.
---
Nitpick comments:
In `@pkg/toolsets/mustgather/pod_logs.go`:
- Line 81: Update the JSON schema description for the "limit" parameter to
explicitly state it returns the first N matching lines (not the last), so
callers understand it behaves differently from grep's tail behavior; locate the
"limit" schema entry in pkg/toolsets/mustgather/pod_logs.go (the `"limit":
{Type: "integer", Description: ...}` entry) and change the Description to
something like "Maximum number of matching lines to return (first N matches,
default: 500)" to clearly indicate it returns the first matches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0e9fc2f6-fb76-4956-a795-f523a3d7dacb
📒 Files selected for processing (1)
pkg/toolsets/mustgather/pod_logs.go
| for _, line := range lines { | ||
| if line == "" { | ||
| continue | ||
| } | ||
| // Each line starts with an RFC3339Nano timestamp followed by a space | ||
| spaceIdx := strings.IndexByte(line, ' ') | ||
| if spaceIdx == -1 { | ||
| continue | ||
| } | ||
| lineTime, err := time.Parse(time.RFC3339Nano, line[:spaceIdx]) | ||
| if err != nil { | ||
| continue | ||
| } | ||
| if lineTime.Before(sinceTime) { | ||
| continue | ||
| } | ||
| if hasUntil && lineTime.After(untilTime) { | ||
| continue | ||
| } |
There was a problem hiding this comment.
Silently skipping lines without parseable timestamps may hide relevant data.
Lines without a valid RFC3339Nano prefix (e.g., multi-line stack traces, malformed entries, or logs collected without --timestamps) are silently dropped. Users won't know if data was lost. Consider either including non-timestamped lines when they immediately follow a matching timestamped line, or at minimum reporting how many lines were skipped.
💡 Suggested improvement to track skipped lines
lines := strings.Split(logs, "\n")
var matchingLines []string
+ skippedLines := 0
for _, line := range lines {
if line == "" {
continue
}
// Each line starts with an RFC3339Nano timestamp followed by a space
spaceIdx := strings.IndexByte(line, ' ')
if spaceIdx == -1 {
+ skippedLines++
continue
}
lineTime, err := time.Parse(time.RFC3339Nano, line[:spaceIdx])
if err != nil {
+ skippedLines++
continue
}Then include skippedLines in the header if non-zero:
header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines))
+ if skippedLines > 0 {
+ header = strings.TrimSuffix(header, "\n\n")
+ header += fmt.Sprintf(" (%d lines skipped due to missing/invalid timestamps)\n\n", skippedLines)
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/pod_logs.go` around lines 273 - 291, The loop in
PodLogs that currently silently drops lines without parseable RFC3339Nano
timestamps should count them and surface that metric: add a skippedLines int
counter and increment it whenever you continue due to spaceIdx == -1 or
time.Parse error (the variables in this block are lines, spaceIdx, lineTime,
sinceTime, hasUntil, untilTime). After processing, include skippedLines in the
log header (only when >0) so callers know how many lines were omitted; update
the header-building code that emits the pod log metadata to append a "(skipped X
non-timestamped lines)" note when skippedLines > 0. Ensure behavior otherwise
unchanged.
6a66cc7 to
2ab4bf2
Compare
|
@swghosh: This pull request references MG-216 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
pkg/toolsets/mustgather/pod_logs.go (1)
167-177:⚠️ Potential issue | 🟠 MajorStream the filtered log paths instead of buffering the whole file.
Both filtering handlers call
GetPodLogand thenstrings.Splitthe full string before applyingtail/limit, so a request for a handful of matches still scales with the entire log size. A reader/scanner-based provider path would be much safer for large must-gather archives.Also applies to: 261-271
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/mustgather/pod_logs.go` around lines 167 - 177, The code currently calls GetPodLog and then does strings.Split(logs, "\n") (see GetPodLog and the lines variable) which buffers the entire log into memory before applying tail/limit; change to a streaming approach: add or use a reader-returning variant (e.g., GetPodLogStream or modify GetPodLog to return an io.ReadCloser/reader from PodLogOptions) and replace the strings.Split-based logic with a bufio.Scanner/reader that applies the filter, tail and limit while scanning; update both occurrences (the block around the GetPodLog call and the duplicate at lines 261-271) so they consume and filter lines incrementally and close the reader when done.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/toolsets/mustgather/pod_logs.go`:
- Around line 233-253: Validate the input early: after reading limit with getInt
(and before parsing times) return an error via api.NewToolCallResult when limit
is negative (require limit >= 0) and when both untilStr and sinceStr are
provided ensure untilTime >= sinceTime (i.e., after parsing sinceTime and
untilTime, if hasUntil and untilTime.Before(sinceTime) return
api.NewToolCallResult with a clear error like "until must be equal or after
since"); update the same validation later where limit and until are used (same
checks around the code paths that reference limit, sinceStr, untilStr,
sinceTime, untilTime, and hasUntil) so callers get a fast, descriptive error
instead of empty results.
- Around line 114-137: The header can be ambiguous when the caller passed an
empty container and the provider resolved it to the first container; update the
code so the actual container name used is surfaced in the header. Modify
GetPodLog (or its caller contract) to return the resolved container name
alongside logs (or expose the resolved container via mg.PodLogOptions/result),
then when building the header in pod_logs.go use that resolved container value
(not the original empty input) so the header prints "container <resolved-name>"
on multi-container pods; apply the same change to the other similar handlers
referenced (the blocks around 167-217 and 261-317) so all logs include the
actual container chosen.
- Around line 295-311: Record the total number of matches before truncation and
use that count in the header instead of the post-truncate slice length: store
totalMatches := len(matchingLines) before the limit logic in the pod_logs.go
function (where matchingLines and limit are handled), then apply the truncation
(matchingLines = matchingLines[:limit] if needed) and build header using
totalMatches for the "Found %d matching line(s)" message while keeping the
returned matchingLines slice limited.
---
Duplicate comments:
In `@pkg/toolsets/mustgather/pod_logs.go`:
- Around line 167-177: The code currently calls GetPodLog and then does
strings.Split(logs, "\n") (see GetPodLog and the lines variable) which buffers
the entire log into memory before applying tail/limit; change to a streaming
approach: add or use a reader-returning variant (e.g., GetPodLogStream or modify
GetPodLog to return an io.ReadCloser/reader from PodLogOptions) and replace the
strings.Split-based logic with a bufio.Scanner/reader that applies the filter,
tail and limit while scanning; update both occurrences (the block around the
GetPodLog call and the duplicate at lines 261-271) so they consume and filter
lines incrementally and close the reader when done.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 80d44601-e66d-434d-bea0-19b88272f14e
📒 Files selected for processing (2)
pkg/mcp/testdata/toolsets-openshift-mustgather-tools.jsonpkg/toolsets/mustgather/pod_logs.go
| logs, err := p.GetPodLog(mg.PodLogOptions{ | ||
| Namespace: namespace, | ||
| Pod: pod, | ||
| Container: container, | ||
| LogType: logType, | ||
| TailLines: tail, | ||
| }) | ||
| if err != nil { | ||
| return api.NewToolCallResult("", fmt.Errorf("failed to get pod logs: %w", err)), nil | ||
| } | ||
|
|
||
| header := fmt.Sprintf("Logs for pod %s/%s", namespace, pod) | ||
| if container != "" { | ||
| header += fmt.Sprintf(", container %s", container) | ||
| } | ||
| if previous { | ||
| header += " (previous)" | ||
| } | ||
| if tail > 0 { | ||
| header += fmt.Sprintf(" (last %d lines)", tail) | ||
| } | ||
| header += ":\n\n" | ||
|
|
||
| return api.NewToolCallResult(header+logs, nil), nil |
There was a problem hiding this comment.
Report the actual container when defaulting to one.
pkg/ocp/mustgather/provider.go:57-70 resolves an omitted container to the first available container, but these handlers do not surface which container was actually used. On multi-container pods that makes the response ambiguous and easy to misread.
💡 Suggested fix
+ if container == "" {
+ containers, err := p.ListPodContainers(namespace, pod)
+ if err != nil {
+ return api.NewToolCallResult("", fmt.Errorf("failed to resolve pod containers: %w", err)), nil
+ }
+ if len(containers) == 0 {
+ return api.NewToolCallResult("", fmt.Errorf("no containers found for pod %s/%s", namespace, pod)), nil
+ }
+ container = containers[0]
+ }
+
logs, err := p.GetPodLog(mg.PodLogOptions{
Namespace: namespace,
Pod: pod,
Container: container,
@@
- if container != "" {
- header += fmt.Sprintf(", container %s", container)
- }
+ header += fmt.Sprintf(", container %s", container)Also applies to: 167-217, 261-317
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/pod_logs.go` around lines 114 - 137, The header can
be ambiguous when the caller passed an empty container and the provider resolved
it to the first container; update the code so the actual container name used is
surfaced in the header. Modify GetPodLog (or its caller contract) to return the
resolved container name alongside logs (or expose the resolved container via
mg.PodLogOptions/result), then when building the header in pod_logs.go use that
resolved container value (not the original empty input) so the header prints
"container <resolved-name>" on multi-container pods; apply the same change to
the other similar handlers referenced (the blocks around 167-217 and 261-317) so
all logs include the actual container chosen.
| limit := getInt(args, "limit", 500) | ||
|
|
||
| if namespace == "" || pod == "" { | ||
| return api.NewToolCallResult("", fmt.Errorf("namespace and pod are required")), nil | ||
| } | ||
| if sinceStr == "" { | ||
| return api.NewToolCallResult("", fmt.Errorf("since is required (RFC3339 format)")), nil | ||
| } | ||
|
|
||
| sinceTime, err := time.Parse(time.RFC3339, sinceStr) | ||
| if err != nil { | ||
| return api.NewToolCallResult("", fmt.Errorf("invalid since time format, expected RFC3339 (e.g. 2026-01-15T10:00:00Z): %w", err)), nil | ||
| } | ||
|
|
||
| var untilTime time.Time | ||
| hasUntil := untilStr != "" | ||
| if hasUntil { | ||
| untilTime, err = time.Parse(time.RFC3339, untilStr) | ||
| if err != nil { | ||
| return api.NewToolCallResult("", fmt.Errorf("invalid until time format, expected RFC3339 (e.g. 2026-01-15T12:00:00Z): %w", err)), nil | ||
| } |
There was a problem hiding this comment.
Reject invalid by-time inputs up front.
limit < 0 currently disables truncation, and until before since is reported as “no matching lines” instead of a caller error. Failing fast here keeps empty results meaningful.
🛠️ Suggested fix
limit := getInt(args, "limit", 500)
+ if limit < 0 {
+ return api.NewToolCallResult("", fmt.Errorf("limit must be >= 0")), nil
+ }
@@
if hasUntil {
untilTime, err = time.Parse(time.RFC3339, untilStr)
if err != nil {
return api.NewToolCallResult("", fmt.Errorf("invalid until time format, expected RFC3339 (e.g. 2026-01-15T12:00:00Z): %w", err)), nil
}
+ if untilTime.Before(sinceTime) {
+ return api.NewToolCallResult("", fmt.Errorf("until must be greater than or equal to since")), nil
+ }
}Also applies to: 295-296
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/pod_logs.go` around lines 233 - 253, Validate the
input early: after reading limit with getInt (and before parsing times) return
an error via api.NewToolCallResult when limit is negative (require limit >= 0)
and when both untilStr and sinceStr are provided ensure untilTime >= sinceTime
(i.e., after parsing sinceTime and untilTime, if hasUntil and
untilTime.Before(sinceTime) return api.NewToolCallResult with a clear error like
"until must be equal or after since"); update the same validation later where
limit and until are used (same checks around the code paths that reference
limit, sinceStr, untilStr, sinceTime, untilTime, and hasUntil) so callers get a
fast, descriptive error instead of empty results.
| if limit > 0 && len(matchingLines) > limit { | ||
| matchingLines = matchingLines[:limit] | ||
| } | ||
|
|
||
| header := fmt.Sprintf("Logs for pod %s/%s between %s and ", namespace, pod, sinceStr) | ||
| if hasUntil { | ||
| header += untilStr | ||
| } else { | ||
| header += "latest" | ||
| } | ||
| if container != "" { | ||
| header += fmt.Sprintf(", container %s", container) | ||
| } | ||
| if previous { | ||
| header += " (previous)" | ||
| } | ||
| header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines)) |
There was a problem hiding this comment.
Keep the total match count separate from the limited output.
After limit truncates matchingLines, the header reports only the returned subset. With 5,000 matches and limit=500, callers are told there were 500 matches.
🛠️ Suggested fix
- if limit > 0 && len(matchingLines) > limit {
+ totalMatches := len(matchingLines)
+ if limit > 0 && totalMatches > limit {
matchingLines = matchingLines[:limit]
}
@@
- header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines))
+ header += fmt.Sprintf(":\n\nFound %d matching line(s)", totalMatches)
+ if limit > 0 && totalMatches > len(matchingLines) {
+ header += fmt.Sprintf(", returning first %d", len(matchingLines))
+ }
+ header += "\n\n"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if limit > 0 && len(matchingLines) > limit { | |
| matchingLines = matchingLines[:limit] | |
| } | |
| header := fmt.Sprintf("Logs for pod %s/%s between %s and ", namespace, pod, sinceStr) | |
| if hasUntil { | |
| header += untilStr | |
| } else { | |
| header += "latest" | |
| } | |
| if container != "" { | |
| header += fmt.Sprintf(", container %s", container) | |
| } | |
| if previous { | |
| header += " (previous)" | |
| } | |
| header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines)) | |
| if limit > 0 && len(matchingLines) > limit { | |
| totalMatches := len(matchingLines) | |
| matchingLines = matchingLines[:limit] | |
| } | |
| header := fmt.Sprintf("Logs for pod %s/%s between %s and ", namespace, pod, sinceStr) | |
| if hasUntil { | |
| header += untilStr | |
| } else { | |
| header += "latest" | |
| } | |
| if container != "" { | |
| header += fmt.Sprintf(", container %s", container) | |
| } | |
| if previous { | |
| header += " (previous)" | |
| } | |
| header += fmt.Sprintf(":\n\nFound %d matching line(s)", totalMatches) | |
| if limit > 0 && totalMatches > len(matchingLines) { | |
| header += fmt.Sprintf(", returning first %d", len(matchingLines)) | |
| } | |
| header += "\n\n" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/pod_logs.go` around lines 295 - 311, Record the total
number of matches before truncation and use that count in the header instead of
the post-truncate slice length: store totalMatches := len(matchingLines) before
the limit logic in the pod_logs.go function (where matchingLines and limit are
handled), then apply the truncation (matchingLines = matchingLines[:limit] if
needed) and build header using totalMatches for the "Found %d matching line(s)"
message while keeping the returned matchingLines slice limited.
|
|
||
| // ReadETCDFile reads a raw ETCD info JSON file and returns its content | ||
| func (p *Provider) ReadETCDFile(filename string) ([]byte, error) { | ||
| filePath := filepath.Join(p.metadata.ContainerDir, "etcd_info", filename) |
There was a problem hiding this comment.
@swghosh can we add a little defensive check here on the filePath that is constructed to ensure that nothing is escaping the etcd_info path on the container?
This is causing the security scan action to fail even though all the call sites currently use hard coded paths that are safe, as this is an exported method
| providerMu sync.RWMutex | ||
| provider *mg.Provider |
There was a problem hiding this comment.
@swghosh how will this work when more than one user connects to the MCP server and tries to check different must gathers?
Is that a valid use case? If not, can we guard against it somehow?
I'm worried about two users calling mustgather_use with different paths
Maybe we can cache these per session (should be on the ToolHandlerParams context iirc)
There was a problem hiding this comment.
Yup, that's a very valid use case.
We'd need to cache the must-gather path per session.
| // Load metadata files | ||
| loadMetadata(containerDir, &metadata) | ||
|
|
||
| // Load all resources | ||
| var resources []*unstructured.Unstructured | ||
|
|
||
| clusterResources, err := loadClusterScopedResources(containerDir) | ||
| if err == nil { | ||
| resources = append(resources, clusterResources...) | ||
| } | ||
|
|
||
| namespacedResources, err := loadNamespacedResources(containerDir) | ||
| if err == nil { |
There was a problem hiding this comment.
do we maybe want to estimate how much memory the loading of resources will consume before doing so/compare against how much memory the pod has? Some must gathers can be quite large - this may crash the pod with OOM
There was a problem hiding this comment.
Also (or instead) is it possible to lazy load this, or does that not make sense?
There was a problem hiding this comment.
lazy loading makes sense - also i'm wondering why we need to pre-load @swghosh . have you noticed any issues getting data on demand?
| if n > 0 { | ||
| sb.Write(buf[:n]) | ||
| } | ||
| if readErr != nil { |
There was a problem hiding this comment.
maybe we want to verify that this is actually an io.EOF error instead of some other kind of error so that we don't silently skip other errors
| nsSet := make(map[string]struct{}) | ||
| for ns := range idx.byNamespace { | ||
| nsSet[ns] = struct{}{} | ||
| } | ||
| idx.namespaces = make([]string, 0, len(nsSet)) | ||
| for ns := range nsSet { | ||
| idx.namespaces = append(idx.namespaces, ns) | ||
| } |
There was a problem hiding this comment.
can't we simplify this to just iterate over the keys in idx.byNamespace? As far as I can tell the nsSet is a map that has the same keys as idx.byNamespace so not sure why we are creating this here
| // healthSymbol returns a symbol for health status | ||
| func healthSymbol(health string) string { | ||
| switch strings.ToLower(health) { | ||
| case "up", "healthy", "ok", "true", "firing": |
There was a problem hiding this comment.
@swghosh you are the expert but maybe we want to move "firing" to "[WARN]" ?
| Prompts: &mcp.PromptCapabilities{ListChanged: !configuration.Stateless}, | ||
| Tools: &mcp.ToolCapabilities{ListChanged: !configuration.Stateless}, | ||
| Logging: &mcp.LoggingCapabilities{}, | ||
| Prompts: &mcp.PromptCapabilities{ListChanged: !configuration.Stateless}, |
There was a problem hiding this comment.
Let's not change this line if not needed
| // ResourceRegistrar allows tools to register MCP resources at runtime. | ||
| // For example, the mustgather_use tool registers a must-gather:// resource | ||
| // after loading an archive. | ||
| type ResourceRegistrar interface { | ||
| AddResource(uri, name, description, mimeType, content string) | ||
| RemoveResources(uris ...string) | ||
| } | ||
|
|
There was a problem hiding this comment.
This type of change will need to be considered upstream first, we don't want to change this downstream
There was a problem hiding this comment.
Added a PR upstream for the MCP resource and templates registration boilerplate,
containers#1091
| "path/filepath" | ||
| "strings" | ||
|
|
||
| "gopkg.in/yaml.v3" |
There was a problem hiding this comment.
maybe sigs.k8s.io/yaml like we use in other places? this seems to be promoting the other yaml dep from indirect to direct
There was a problem hiding this comment.
Updated to use sigs.k8s.io/yaml instead.
@Cali0707 whenever you're around with this, PTAL at another proof-of-concept MCP resources server that I've been playing around with: https://github.com/swghosh/mcp-x-mg/tree/main/mustgather-resource-mcp-server#mcp-resources. |
2ab4bf2 to
7604f45
Compare
|
@swghosh: This pull request references MG-216 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
7604f45 to
2d6467c
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
pkg/mcp/resource_test.go (1)
62-65: Prefer fail-fast assertions for captured registrars.After
Line 63/Line 64, consider assertings.resourceRegands.templateRegare non-nil to make wiring failures explicit before subtests.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/mcp/resource_test.go` around lines 62 - 65, The test calls s.CallTool("test_init", ...) to populate registrars but doesn't assert they were actually set; add fail-fast assertions immediately after that call to ensure s.resourceReg and s.templateReg are non-nil so wiring failures fail the test early. Locate the initialization call (s.CallTool("test_init", map[string]any{})) and add assertions checking s.resourceReg != nil and s.templateReg != nil (use the test suite's Require/Assert helpers) before proceeding to subtests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/mcp/resource.go`:
- Around line 10-15: There are two conflicting declarations of type
resourceRegistrar (used to implement api.ResourceRegistrar) in this package;
remove one to avoid duplicate-type compile errors and keep the implementation
that preserves the safer closure capture pattern (resourceContent := content)
used in the gosdk.go variant. Delete either the resourceRegistrar block in
pkg/mcp/resource.go or in pkg/mcp/gosdk.go, then ensure the remaining
resourceRegistrar struct references server *mcp.Server and implements the same
methods, and that any handler/closure code uses the resourceContent := content
pattern to avoid closure capture bugs.
In `@pkg/toolsets/mustgather/nodes.go`:
- Around line 230-242: The header currently uses len(matchingLines) after the
tail truncation, so it shows the tailed count instead of the total matches;
capture the total match count before you apply the tail (e.g. totalMatches :=
len(matchingLines) immediately before the tail logic) and then use totalMatches
in the header formatting instead of len(matchingLines); adjust the existing
block around matchingLines, tail, header, node, filter, and caseInsensitive in
nodes.go so the "Found X matching line(s)" reflects the original total while the
displayed matchingLines remain tailed.
- Around line 125-137: The sections slice defines two entries with the same key
"pods", making the second entry (title "Pod Resources", value diag.PodResources)
unreachable when filtering by shouldInclude; update the key for the Pod
Resources entry to a unique identifier such as "podresources" (or another
distinct string) and update any related schema/description to reflect the new
key so that shouldInclude("podresources") will correctly include
diag.PodResources.
---
Nitpick comments:
In `@pkg/mcp/resource_test.go`:
- Around line 62-65: The test calls s.CallTool("test_init", ...) to populate
registrars but doesn't assert they were actually set; add fail-fast assertions
immediately after that call to ensure s.resourceReg and s.templateReg are
non-nil so wiring failures fail the test early. Locate the initialization call
(s.CallTool("test_init", map[string]any{})) and add assertions checking
s.resourceReg != nil and s.templateReg != nil (use the test suite's
Require/Assert helpers) before proceeding to subtests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b0c259ea-bea9-4a6a-acd1-a13998bcfbd9
📒 Files selected for processing (26)
internal/test/mcp.gointernal/tools/update-readme/main.gopkg/api/toolsets.gopkg/mcp/gosdk.gopkg/mcp/mcp.gopkg/mcp/modules.gopkg/mcp/resource.gopkg/mcp/resource_test.gopkg/mcp/testdata/toolsets-openshift-mustgather-tools.jsonpkg/mcp/toolsets_test.gopkg/ocp/mustgather/etcd.gopkg/ocp/mustgather/index.gopkg/ocp/mustgather/loader.gopkg/ocp/mustgather/monitoring.gopkg/ocp/mustgather/provider.gopkg/ocp/mustgather/types.gopkg/toolsets/mustgather/etcd.gopkg/toolsets/mustgather/events.gopkg/toolsets/mustgather/helpers.gopkg/toolsets/mustgather/monitoring.gopkg/toolsets/mustgather/namespaces.gopkg/toolsets/mustgather/nodes.gopkg/toolsets/mustgather/pod_logs.gopkg/toolsets/mustgather/resources.gopkg/toolsets/mustgather/toolset.gopkg/toolsets/mustgather/use.go
✅ Files skipped from review due to trivial changes (5)
- pkg/mcp/toolsets_test.go
- pkg/toolsets/mustgather/toolset.go
- pkg/mcp/modules.go
- pkg/ocp/mustgather/types.go
- pkg/mcp/testdata/toolsets-openshift-mustgather-tools.json
🚧 Files skipped from review as they are similar to previous changes (10)
- pkg/toolsets/mustgather/namespaces.go
- pkg/toolsets/mustgather/use.go
- internal/tools/update-readme/main.go
- pkg/mcp/mcp.go
- pkg/ocp/mustgather/monitoring.go
- pkg/toolsets/mustgather/helpers.go
- pkg/toolsets/mustgather/monitoring.go
- pkg/toolsets/mustgather/events.go
- pkg/ocp/mustgather/etcd.go
- pkg/toolsets/mustgather/resources.go
| sections := []struct { | ||
| key, title, content string | ||
| }{ | ||
| {"sysinfo", "System Info", diag.SysInfo}, | ||
| {"lscpu", "CPU Info (lscpu)", diag.Lscpu}, | ||
| {"cpu", "CPU Affinities", diag.CPUAffinities}, | ||
| {"irq", "IRQ Affinities", diag.IRQAffinities}, | ||
| {"lspci", "PCI Devices (lspci)", diag.Lspci}, | ||
| {"dmesg", "Kernel Messages (dmesg)", diag.Dmesg}, | ||
| {"cmdline", "Kernel Boot Parameters", diag.ProcCmdline}, | ||
| {"pods", "Pods Info", diag.PodsInfo}, | ||
| {"pods", "Pod Resources", diag.PodResources}, | ||
| } |
There was a problem hiding this comment.
Duplicate key "pods" causes one section to be unreachable.
Lines 135 and 136 both use "pods" as the key. When a user passes include=pods, only the first matching section (Pods Info) is included because shouldInclude("pods") returns the same result for both. Either Pod Resources should use a different key (e.g., "podresources"), or the description should clarify this behavior.
{"cmdline", "Kernel Boot Parameters", diag.ProcCmdline},
{"pods", "Pods Info", diag.PodsInfo},
- {"pods", "Pod Resources", diag.PodResources},
+ {"podresources", "Pod Resources", diag.PodResources},
}Update the schema description accordingly:
- "include": {Type: "string", Description: "Comma-separated diagnostics to include: kubelet,sysinfo,cpu,irq,pods,lscpu,lspci,dmesg,cmdline (default: all)"},
+ "include": {Type: "string", Description: "Comma-separated diagnostics to include: kubelet,sysinfo,cpu,irq,pods,podresources,lscpu,lspci,dmesg,cmdline (default: all)"},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| sections := []struct { | |
| key, title, content string | |
| }{ | |
| {"sysinfo", "System Info", diag.SysInfo}, | |
| {"lscpu", "CPU Info (lscpu)", diag.Lscpu}, | |
| {"cpu", "CPU Affinities", diag.CPUAffinities}, | |
| {"irq", "IRQ Affinities", diag.IRQAffinities}, | |
| {"lspci", "PCI Devices (lspci)", diag.Lspci}, | |
| {"dmesg", "Kernel Messages (dmesg)", diag.Dmesg}, | |
| {"cmdline", "Kernel Boot Parameters", diag.ProcCmdline}, | |
| {"pods", "Pods Info", diag.PodsInfo}, | |
| {"pods", "Pod Resources", diag.PodResources}, | |
| } | |
| sections := []struct { | |
| key, title, content string | |
| }{ | |
| {"sysinfo", "System Info", diag.SysInfo}, | |
| {"lscpu", "CPU Info (lscpu)", diag.Lscpu}, | |
| {"cpu", "CPU Affinities", diag.CPUAffinities}, | |
| {"irq", "IRQ Affinities", diag.IRQAffinities}, | |
| {"lspci", "PCI Devices (lspci)", diag.Lspci}, | |
| {"dmesg", "Kernel Messages (dmesg)", diag.Dmesg}, | |
| {"cmdline", "Kernel Boot Parameters", diag.ProcCmdline}, | |
| {"pods", "Pods Info", diag.PodsInfo}, | |
| {"podresources", "Pod Resources", diag.PodResources}, | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/nodes.go` around lines 125 - 137, The sections slice
defines two entries with the same key "pods", making the second entry (title
"Pod Resources", value diag.PodResources) unreachable when filtering by
shouldInclude; update the key for the Pod Resources entry to a unique identifier
such as "podresources" (or another distinct string) and update any related
schema/description to reflect the new key so that shouldInclude("podresources")
will correctly include diag.PodResources.
| // Apply tail from the end of matches | ||
| if tail > 0 && len(matchingLines) > tail { | ||
| matchingLines = matchingLines[len(matchingLines)-tail:] | ||
| } | ||
|
|
||
| header := fmt.Sprintf("Kubelet logs for node %s filtered by '%s'", node, filter) | ||
| if caseInsensitive { | ||
| header += " (case-insensitive)" | ||
| } | ||
| if tail > 0 { | ||
| header += fmt.Sprintf(" (last %d matches)", tail) | ||
| } | ||
| header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines)) |
There was a problem hiding this comment.
Match count header shows tailed count, not total matches.
After applying tail truncation on Lines 231-232, Line 242 reports len(matchingLines) which is the tailed count. If there were 500 matches and tail=10, users see "Found 10 matching line(s)" instead of the actual 500.
Suggested fix
+ totalMatches := len(matchingLines)
// Apply tail from the end of matches
if tail > 0 && len(matchingLines) > tail {
matchingLines = matchingLines[len(matchingLines)-tail:]
}
header := fmt.Sprintf("Kubelet logs for node %s filtered by '%s'", node, filter)
if caseInsensitive {
header += " (case-insensitive)"
}
if tail > 0 {
header += fmt.Sprintf(" (last %d matches)", tail)
}
- header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines))
+ header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", totalMatches)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Apply tail from the end of matches | |
| if tail > 0 && len(matchingLines) > tail { | |
| matchingLines = matchingLines[len(matchingLines)-tail:] | |
| } | |
| header := fmt.Sprintf("Kubelet logs for node %s filtered by '%s'", node, filter) | |
| if caseInsensitive { | |
| header += " (case-insensitive)" | |
| } | |
| if tail > 0 { | |
| header += fmt.Sprintf(" (last %d matches)", tail) | |
| } | |
| header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", len(matchingLines)) | |
| totalMatches := len(matchingLines) | |
| // Apply tail from the end of matches | |
| if tail > 0 && len(matchingLines) > tail { | |
| matchingLines = matchingLines[len(matchingLines)-tail:] | |
| } | |
| header := fmt.Sprintf("Kubelet logs for node %s filtered by '%s'", node, filter) | |
| if caseInsensitive { | |
| header += " (case-insensitive)" | |
| } | |
| if tail > 0 { | |
| header += fmt.Sprintf(" (last %d matches)", tail) | |
| } | |
| header += fmt.Sprintf(":\n\nFound %d matching line(s)\n\n", totalMatches) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/mustgather/nodes.go` around lines 230 - 242, The header
currently uses len(matchingLines) after the tail truncation, so it shows the
tailed count instead of the total matches; capture the total match count before
you apply the tail (e.g. totalMatches := len(matchingLines) immediately before
the tail logic) and then use totalMatches in the header formatting instead of
len(matchingLines); adjust the existing block around matchingLines, tail,
header, node, filter, and caseInsensitive in nodes.go so the "Found X matching
line(s)" reflects the original total while the displayed matchingLines remain
tailed.
2d6467c to
ac8be42
Compare
bentito
left a comment
There was a problem hiding this comment.
This is a big PR, but it introduces highly valuable and much-needed capabilities for offline RCA. I fully support merging this soon and mostly as-is to get the baseline functionality in place. We can address the architectural and performance suggestions below as follow-up work.
Only my comments below about concurrency on the package variables maybe need addressing pre-merge:
Particularly, as a follow-up, we need to ensure that our recent injection ideas for the netedge toolset integrate smoothly with this offline mode after it merges.
Here is a technical review and suggestions for future iterations, specifically looking at this from an AI agent's perspective:
1. Agent Workflow: The use Initialization Step
From an agent's perspective, forcing the LLM to explicitly call mustgather_use before accessing any other tools creates a brittle, linear workflow. Agents frequently forget prerequisite setup steps, requiring extra turns to recover from errors.
- Suggestion for follow-up: Allow initializing the must-gather provider as a check-step internally for all the other must-gather tool calls. If it's not done, just do it silently when needed.
2. Concurrency and State Management
Currently, the loaded archive state is stored in a package-level global variable:
// pkg/toolsets/mustgather/helpers.go
var provider *mg.Provider- Suggestion for change: In a multi-client HTTP/SSE deployment, if User A calls
mustgather_use(path="/archive1")and User B callsmustgather_use(path="/archive2"), User B will overwrite the global state, breaking User A's session. Moving to the startup-configuration approach mentioned above (tying the server instance to a specific archive) or implementing session-scoped state would resolve this race condition safely.
3. Memory Efficiency in Log Processing
The log filtering tools (mustgather_pod_logs_grep and mustgather_pod_logs_by_time) currently read the entire log file into memory using readTextFile before splitting and iterating.
- Suggestion for follow-up: Because must-gather pod logs can easily reach hundreds of megabytes, this approach risks significant memory spikes or OOM kills on the MCP server. Refactoring these to use a
bufio.Scannerto stream the file line-by-line from disk—applying the regex/time filters during the scan and only retaining the matched ines - would make this significantly more robust.
Great work on the core parsing and indexing logic! Let's get this merged and iterate on the optimizations.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bentito, swghosh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
ac8be42 to
1ef296e
Compare
|
New changes are detected. LGTM label has been removed. |
* Add MCP resources to Toolset interface Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * Add MCP resource capabilities to Server Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * React to changes in the Toolset interface Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * Update suggestions encountered during review Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * Update suggestions from review Signed-off-by: Swarup Ghosh <swghosh@redhat.com> * Add more test coverage for MCP resources (assertion pairing in resource_test.go, Blob/MIMEType-override coverage, filter/mutator tests, AddResource URI-validation panic) Signed-off-by: Swarup Ghosh <swghosh@redhat.com> --------- Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Swarup Ghosh <swghosh@redhat.com> Add mustgather_events_by_time tool Add mustgather_pod_logs_grep tool Add mustgather_pod_logs_by_time tool Signed-off-by: Swarup Ghosh <swghosh@redhat.com> Update to use MCP resources Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
1ef296e to
95aa015
Compare
|
@swghosh: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Offline debugging analysis use cases require us to get data parsed from a must-gather archive rather than online cluster's kapiserver endpoint. Some tools are similar to the core toolset but there are more tools beyond resources_get especially journal logs, kubelet logs, etc.
Summary by CodeRabbit
New Features
openshift/mustgathertoolset enabling offline analysis of Kubernetes/OpenShift must-gather archives.Tests