Skip to content

[DO NOT MERGE] feat: linearization tour over the code graph (draft)#127

Closed
jonathanpopham wants to merge 2 commits intosupermodeltools:mainfrom
jonathanpopham:feat/linearization-tour
Closed

[DO NOT MERGE] feat: linearization tour over the code graph (draft)#127
jonathanpopham wants to merge 2 commits intosupermodeltools:mainfrom
jonathanpopham:feat/linearization-tour

Conversation

@jonathanpopham
Copy link
Copy Markdown
Contributor

@jonathanpopham jonathanpopham commented Apr 13, 2026

Objective

LLMs are one-dimensional — they consume a token stream left-to-right. Graphs are not. Today's .graph.* shards linearize at the file level (each sidecar is a flat [deps] / [calls] / [impact] view), but agents get N independent shards with no reading order and have to guess where to start.

This PR adds a tour: a single .supermodel/TOUR.md that serializes the graph into a strategy-chosen linear walk, grouped by domain/subdomain, linking each file to its shard. It also adds a prose narrative preamble on each shard so call/import adjacency reads as sentences rather than arrows. Both are the missing linearization layer over the existing sharding feature.

Feature-complete surface

Standalone tour:

supermodel tour [--strategy topo|bfs-seed|dfs-seed|centrality]
                [--seed <file>]
                [--narrate]
                [--budget-tokens <N>]
                [--dry-run] [path]

Integrated with analyze (one command = shards + spine):

supermodel analyze [--tour]
                   [--tour-strategy ...] [--tour-seed <file>]
                   [--tour-budget <N>]
                   [--narrate]
                   [path]

Strategies (all implemented)

Strategy Ordering Best for
topo reverse-topological over imports (leaves first) whole-codebase onboarding
bfs-seed BFS from --seed outward (undirected import graph) focused tasks, blast-radius walks
dfs-seed DFS from --seed outward (emit-on-pop) tracing a request through layers
centrality transitive-dependent count desc (largest blast radius) "what's the core of this codebase"

--narrate (implemented)

Each shard gets a prose preamble:

// Narrative:
// This file (src/auth.ts) belongs to domain Auth.
// It imports 2 file(s): src/db.ts, src/util.ts.
// It is imported by 1 file(s): src/api.ts.
//   login is called by handleRequest and calls query.
// Risk: MEDIUM (2 transitive dependent(s)).

Same data as [deps]/[calls]/[impact] — second rendering targeted at LLM reading style.

--budget-tokens N (implemented)

Long tours chunk into TOUR.01.md, TOUR.02.md, ... with prev/next cross-links. TOUR.md becomes an index. Split points are file-entry boundaries — no partial entries. Heuristic: 4 chars ≈ 1 token.

Files changed

File Purpose
docs/linearization.md Design doc (updated to reflect implemented surface)
internal/shards/tour.go TourStrategy interface + all 4 strategies + ChunkTour + WriteTour + ResolveStrategy
internal/shards/narrative.go RenderNarrative — prose preamble generator
internal/shards/render.go RenderAll / RenderAllThreeFile take narrate bool
internal/shards/handler.go GenerateOptions extended; maybeWriteTour runs after shards
internal/shards/tour_test.go 13 tests covering strategies, chunking, narrative, narrate-in-render
internal/shards/daemon.go + render_*_test.go updated call sites for new signature
cmd/tour.go subcommand with all flags
cmd/analyze.go --tour* and --narrate flags; wires through GenerateOptions

Test plan

  • go test ./internal/shards/all 18 tour/narrative/strategy tests pass (5 from initial PR + 13 new); full shards suite green
  • go test ./... — full suite green, no regressions
  • go vet ./... — clean
  • Smoke test all strategies on a fabricated graph: topo / centrality / bfs-seed / dfs-seed produce correct orderings
  • Smoke test --budget-tokens 60: 5-file graph chunked into 5 chapters + index with prev/next links
  • Smoke test --narrate: 5 shards rewritten with prose preamble; narrative + structured sections coexist
  • Empirical benchmark — wire supermodeltools/supermodel-benchmarks/shard-ab-test/ and mcpbr-eval to measure agent task completion with/without TOUR+narrate. Not in this PR — landing the surface so we can benchmark it.

Design principles

  • Same vertical slice. Everything lives inside internal/shards/. No cross-slice dependency.
  • Additive. Default analyze behavior unchanged.
  • Deterministic. Lex tiebreaks, stable sorts, cycle-safe, tests pin invariants.
  • Strategy-pluggable. Adding a new ordering is one TourStrategy implementation + one ResolveStrategy case.

Open questions for reviewers

  • Inline vs. linked shard content in TOUR.md? Currently linked (DRY, requires pointer-following). Could add --inline.
  • --focus <glob> filter to scope tours to a subtree?
  • Should tour --budget-tokens 0 clean up stale TOUR.NN.md files from a prior chunked run? (Currently cosmetic leftover.)
  • Does arch-docs want TOUR.md as its entry point instead of its own traversal?

References

  • arXiv:2410.19494 — Graph Linearization Methods for Reasoning on Graphs with LLMs
  • supermodeltools/codegraph-graphrag — BFS narrative walks, thesis doc
  • supermodeltools/graph2md — per-node markdown emission
  • supermodeltools/mcp/src/tools/explore-function.tsdescribeNode, cross-subsystem markers

Summary by CodeRabbit

  • New Features

    • Added a new CLI command to generate a repository "tour" (TOUR.md) that linearizes files in a chosen order (strategies: topo, bfs-seed, dfs-seed, centrality), supports seed selection, token-budget chaptering, dry-run previews, and an optional narrate mode that prepends prose summaries to shard sidecars.
  • CLI

    • Integrated tour and narrate flags into analysis workflow with non-fatal warnings on tour failures.
  • Documentation

    • Added guide describing linearization, strategies, CLI usage, and output shape.
  • Tests

    • Added comprehensive tests for ordering, rendering, chunking, narration, and strategy resolution.

LLMs are 1D and graphs aren't. Per-file shards already linearize at the
file level, but agents get N independent sidecars with no reading order
and must guess where to start. This adds a tour: a single TOUR.md that
serializes the whole graph into a strategy-chosen walk (reverse-topo by
default), grouped by domain/subdomain, linking each file to its shard.

- docs/linearization.md — design doc, thesis, strategy taxonomy, prior art
- internal/shards/tour.go — TourStrategy interface + TopoStrategy
- internal/shards/tour_test.go — determinism, leaves-first, cycle handling
- cmd/tour.go — new 'supermodel tour' subcommand

Draft PR for design feedback before scoping follow-ons (bfs-seed,
centrality, --narrate prose preamble on shards, --budget-tokens chunking).
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Walkthrough

This PR adds a new CLI command supermodel tour plus tour-generation internals: it reads .supermodel/shards.json, builds a shard cache, chooses a linearization strategy, renders a repository TOUR (optionally chunked by token budget), and can optionally prepend prose narratives to existing per-file shard sidecars.

Changes

Cohort / File(s) Summary
CLI: tour command
cmd/tour.go
New Cobra subcommand tour [path] that resolves repo path, reads/unmarshals .supermodel/shards.json, builds cache, resolves strategy, calls shards.WriteTour, supports flags --strategy, --seed, --narrate, --budget-tokens, --dry-run, and re-renders shards when --narrate is set.
Analyze CLI wiring
cmd/analyze.go
Adds --narrate, --tour, --tour-strategy, --tour-seed, --tour-budget flags and threads them into GenerateOptions; validates incompatible flag combos (--no-shards with --tour/--narrate).
Core tour engine
internal/shards/tour.go
New tour generator: TourStrategy interface and implementations (TopoStrategy, BFSSeedStrategy, DFSSeedStrategy, CentralityStrategy), RenderTour, ChunkTour, and WriteTour (atomic writes, chunking into TOUR.01.md…, index TOUR.md).
Narrative generator
internal/shards/narrative.go
New RenderNarrative (exported) plus helpers to synthesize per-file prose describing imports/importers, functions, adjacency, risk, and transitive dependents; returns comment-prefixed narrative block.
Rendering plumbing
internal/shards/render.go, internal/shards/render_stale_test.go, internal/shards/render_test.go
Extended RenderAll/RenderAllThreeFile signatures to accept narrate bool; when true, prepends RenderNarrative output to shard contents. Updated tests to pass new arg.
Generate & handler changes
internal/shards/handler.go, internal/shards/daemon.go
Extended GenerateOptions and RenderOptions with tour/narrate fields; updated renderShards signature to accept narrate; added ResolveStrategy and maybeWriteTour to optionally write tours and handle strategy resolution (non-fatal warnings on failure). Updated Daemon call sites to new RenderAll signature.
Tests for tour & narrative
internal/shards/tour_test.go, internal/shards/render_stale_test.go, internal/shards/render_test.go
New tests validating ordering strategies, deterministic/cycle-safe topo ordering, BFS/DFS seed behaviors, centrality ordering, tour rendering and chunking, and narrative output & re-rendering. Updated render tests for new signatures.
Docs
docs/linearization.md
New documentation describing TOUR.md shape, linearization strategies, chunking behavior, --narrate opt-in, CLI surfaces, and examples.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as "supermodel tour"
    participant FS as "File System"
    participant ShardIR as "shards.json -> IR"
    participant Cache as "Shard Cache"
    participant Strategy
    participant Renderer as "RenderTour/ChunkTour"
    participant Writer as "WriteTour"

    User->>CLI: run `supermodel tour [path] --strategy=... --narrate`
    CLI->>FS: read `<repo>/.supermodel/shards.json`
    FS-->>CLI: JSON bytes
    CLI->>ShardIR: unmarshal -> IR
    CLI->>Cache: NewCache().Build(&ir)
    Cache-->>CLI: populated cache
    CLI->>Strategy: ResolveStrategy(name, seed) -> Order(cache)
    Strategy->>Cache: traverse graph / compute metrics
    Strategy-->>CLI: ordered file list
    CLI->>Renderer: RenderTour(cache, order, prefix)
    Renderer-->>Cache: file metadata, domains, imports
    Renderer-->>CLI: markdown body (possibly chunked)
    CLI->>Writer: WriteTour(repoDir, cache, strategy, budget, dryRun)
    Writer->>FS: atomic write (tmp -> rename) of TOUR.md / chapters
    FS-->>Writer: write result
    alt --narrate set
        CLI->>Renderer: RenderNarrative per-file
        Renderer->>Writer: overwrite `.graph.*` shards with narrative preamble
    end
    Writer-->>CLI: success / path info
    CLI-->>User: success message (unless --dry-run)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • greynewell

Poem

🗺️ Files line up, a tour unfurled,
Paths and imports step through the world,
Narratives whisper, shards sing along,
Chapters stitched tidy, ordered, and strong,
A little map so readers find home.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 61.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: linearization tour over the code graph (draft)' accurately describes the main feature: adding a linearization tour system for the code graph.
Description check ✅ Passed The PR description is comprehensive, covering objectives, feature surface, test results, and design principles, though it marks some items as not yet completed (e.g., empirical benchmarking).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@jonathanpopham jonathanpopham changed the title feat: linearization tour over the code graph (draft) [DO NOT MERGE] feat: linearization tour over the code graph (draft) Apr 13, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
internal/shards/tour_test.go (1)

90-92: Cycle test should assert actual members, not only count.

On Line 90, checking only len(order) == 2 can miss wrong contents. Quick hardening: assert both a.go and b.go are present.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/tour_test.go` around lines 90 - 92, The test currently only
checks the length of the cycle participant slice `order`; modify the assertion
to verify the actual members instead of just the count by asserting that both
expected filenames (e.g., "a.go" and "b.go") appear in `order` (use explicit
equality or a membership check on `order`), so replace the loose `len(order) !=
2` check with assertions that `order` contains "a.go" and "b.go" (and fail the
test with a clear message if either is missing).
docs/linearization.md (1)

43-47: Add languages to fenced code blocks to satisfy markdown lint.

Line 43 and Line 115 use bare fences; adding explicit languages (for example text or bash) will clear MD040 warnings.

Also applies to: 115-121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/linearization.md` around lines 43 - 47, Replace the bare fenced code
blocks with language-tagged fences to satisfy markdown lint: locate the
triple-backtick block that lists TOUR.md, src/auth/session.go, and
src/auth/session.graph.go and change the opening fence to include a language
(e.g., ```text or ```bash), and do the same for the later multi-line fenced
block (the one currently using plain ``` around the example lines); ensure both
opening fences include an explicit language token.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/linearization.md`:
- Around line 115-129: The docs list several flags/strategies for the
"supermodel tour" CLI (--strategy bfs-seed|dfs-seed|centrality, --seed,
--narrate, --budget-tokens) that are not implemented (current CLI only supports
--strategy topo and --dry-run); either remove or mark those options as TODO in
docs or implement the missing runtime support in the command handler for
"supermodel tour" to accept and act on --strategy(bfs-seed|dfs-seed|centrality),
--seed, --narrate, and --budget-tokens; update the CLI parsing and executor code
that reads flags for "supermodel tour" to validate and handle these options (or
sync the docs to only mention --strategy topo and --dry-run) so the
documentation and implementation match.

In `@internal/shards/tour.go`:
- Around line 178-199: riskFor currently only aggregates domains from
cache.TransitiveDependents(file) and the file itself via cache.FileDomain, which
can diverge from renderImpactSection because that function also includes domains
of direct caller files (function-call edges); update riskFor to also iterate
direct callers of the file (i.e., the files that call this file via
function-call edges) and add their cache.FileDomain entries into the domains set
before computing thresholds so the domain count matches renderImpactSection
(reference symbols: riskFor, cache.TransitiveDependents, cache.FileDomain and
the codepath that obtains direct callers such as the function that returns
direct callers/callers for a file in the cache).
- Around line 21-23: Add one-line doc comments for the exported methods
TopoStrategy.Name and TopoStrategy.Order describing that Name returns the
strategy identifier and Order returns the ordered files for the given cache;
also change the signature of splitDomain to name its two return values (e.g.,
func splitDomain(...) (dom string, sub string)) so callers/readers immediately
know what each returned string represents and to satisfy the linter.

---

Nitpick comments:
In `@docs/linearization.md`:
- Around line 43-47: Replace the bare fenced code blocks with language-tagged
fences to satisfy markdown lint: locate the triple-backtick block that lists
TOUR.md, src/auth/session.go, and src/auth/session.graph.go and change the
opening fence to include a language (e.g., ```text or ```bash), and do the same
for the later multi-line fenced block (the one currently using plain ``` around
the example lines); ensure both opening fences include an explicit language
token.

In `@internal/shards/tour_test.go`:
- Around line 90-92: The test currently only checks the length of the cycle
participant slice `order`; modify the assertion to verify the actual members
instead of just the count by asserting that both expected filenames (e.g.,
"a.go" and "b.go") appear in `order` (use explicit equality or a membership
check on `order`), so replace the loose `len(order) != 2` check with assertions
that `order` contains "a.go" and "b.go" (and fail the test with a clear message
if either is missing).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a4fb4858-cb54-436e-81ce-aa1db99803cb

📥 Commits

Reviewing files that changed from the base of the PR and between 1dbd1d9 and 2ef0846.

📒 Files selected for processing (4)
  • cmd/tour.go
  • docs/linearization.md
  • internal/shards/tour.go
  • internal/shards/tour_test.go

Comment thread docs/linearization.md
Comment thread internal/shards/tour.go
Comment on lines +21 to +23
func (TopoStrategy) Name() string { return "topo" }

func (TopoStrategy) Order(cache *Cache) []string {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify current lint findings are resolved for this file.
golangci-lint run ./... --disable-all -E revive -E gocritic | rg -n 'internal/shards/tour.go:(21|23|161)'

Repository: supermodeltools/cli

Length of output: 168


🏁 Script executed:

# First, check if the file exists and read the relevant lines
wc -l internal/shards/tour.go

Repository: supermodeltools/cli

Length of output: 90


🏁 Script executed:

# Read lines around 21, 23, and 161 to understand current state
sed -n '19,25p' internal/shards/tour.go

Repository: supermodeltools/cli

Length of output: 271


🏁 Script executed:

# Read lines around 161 to see the splitDomain function
sed -n '159,169p' internal/shards/tour.go

Repository: supermodeltools/cli

Length of output: 228


🏁 Script executed:

# Check the correct golangci-lint flags (try without --disable-all)
golangci-lint run ./internal/shards/tour.go -E revive -E gocritic --out-format=json 2>&1 | head -100

Repository: supermodeltools/cli

Length of output: 166


🏁 Script executed:

# Try running golangci-lint with simpler syntax
golangci-lint run ./internal/shards/tour.go -E revive -E gocritic 2>&1

Repository: supermodeltools/cli

Length of output: 1627


🏁 Script executed:

# Also try with specific go files flag
golangci-lint run ./internal/shards/ -E revive -E gocritic 2>&1 | grep -A 2 "tour.go"

Repository: supermodeltools/cli

Length of output: 45


🏁 Script executed:

# Check if there's a golangci-lint config file
find . -maxdepth 2 -name ".golangci*" -o -name "golangci*"

Repository: supermodeltools/cli

Length of output: 79


🏁 Script executed:

# Try running golangci-lint on just the repo without typecheck
golangci-lint run ./internal/shards/tour.go --no-config -E revive -E gocritic -D typecheck 2>&1 | head -50

Repository: supermodeltools/cli

Length of output: 270


🏁 Script executed:

# Let's just check the Go syntax by looking at exported funcs/methods
# Exported methods/functions that lack doc comments in tour.go
grep -n "^func" internal/shards/tour.go | head -20

Repository: supermodeltools/cli

Length of output: 752


🏁 Script executed:

# Check the golangci config to understand what linters are enabled
cat ./.golangci.yaml

Repository: supermodeltools/cli

Length of output: 2570


🏁 Script executed:

# Now let's try to run revive specifically on just the exported methods/functions
# Focus on lines 21, 23, and check if doc comments are missing
sed -n '20,24p' internal/shards/tour.go
echo "---"
sed -n '160,162p' internal/shards/tour.go

Repository: supermodeltools/cli

Length of output: 262


🏁 Script executed:

# Let's check which exported functions/methods lack doc comments
# Lines 21, 23, 92, 132, 147, 154, 161, 171, 181, 202, 212
# Some are unexported (lowercase), so let's filter for actually exported (uppercase start)

grep -n "^func ([A-Z]\|^func [A-Z]" internal/shards/tour.go

Repository: supermodeltools/cli

Length of output: 361


🏁 Script executed:

# Check if these functions have doc comments by looking at the line before each
sed -n '20,21p' internal/shards/tour.go
echo "---"
sed -n '22,23p' internal/shards/tour.go
echo "---"
sed -n '160,161p' internal/shards/tour.go

Repository: supermodeltools/cli

Length of output: 222


🏁 Script executed:

# Try running revive specifically on the file to see exported comment warnings
revive -config ./.golangci.yaml ./internal/shards/tour.go 2>&1 | grep -E "(exported|Name|Order|splitDomain)"

Repository: supermodeltools/cli

Length of output: 45


🏁 Script executed:

# Try running just revive with simpler config
revive -severity=warning ./internal/shards/tour.go 2>&1

Repository: supermodeltools/cli

Length of output: 109


🏁 Script executed:

# Check for gocritic unnamedResult specifically
# gocritic is harder to run standalone, so let's search for functions with unnamed returns
rg 'func.*\(.*\)\s*\(' internal/shards/tour.go -A 1 | head -30

Repository: supermodeltools/cli

Length of output: 273


🏁 Script executed:

# Let's see what the issue actually is by checking the exact code again
cat -n internal/shards/tour.go | sed -n '19,25p'
cat -n internal/shards/tour.go | sed -n '159,170p'

Repository: supermodeltools/cli

Length of output: 571


Add doc comments and name the return values to fix lint issues.

Go's linter expects exported methods (ones that start with a capital letter) to have doc comments explaining what they do. Right now, Name() and Order() on TopoStrategy are missing these. Think of it like: if you're sharing code with others, tell them what each public method does.

Also, splitDomain() has two return values but they're unnamed — just (string, string). The linter prefers them to be named (like (dom string, sub string)) so anyone reading the code immediately knows what each returned value represents.

The suggested fixes above will resolve all three issues:

  • Add a one-line doc comment before Name() explaining it returns the strategy identifier
  • Add a one-line doc comment before Order() explaining it returns the ordered files
  • Name the return values in splitDomain() as dom and sub
🧰 Tools
🪛 GitHub Check: golangci-lint

[failure] 23-23:
exported: exported method TopoStrategy.Order should have comment or be unexported (revive)


[failure] 21-21:
exported: exported method TopoStrategy.Name should have comment or be unexported (revive)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/tour.go` around lines 21 - 23, Add one-line doc comments for
the exported methods TopoStrategy.Name and TopoStrategy.Order describing that
Name returns the strategy identifier and Order returns the ordered files for the
given cache; also change the signature of splitDomain to name its two return
values (e.g., func splitDomain(...) (dom string, sub string)) so callers/readers
immediately know what each returned string represents and to satisfy the linter.

Comment thread internal/shards/tour.go
Comment on lines +178 to +199
// riskFor is a narrow re-derivation of the impact section's risk tier for the
// tour line. It matches renderImpactSection's thresholds so tour and shard stay
// in agreement.
func riskFor(file string, cache *Cache) string {
transitive := cache.TransitiveDependents(file)
domains := map[string]bool{}
if d := cache.FileDomain[file]; d != "" {
domains[d] = true
}
for f := range transitive {
if d := cache.FileDomain[f]; d != "" {
domains[d] = true
}
}
switch {
case len(transitive) > 20 || len(domains) > 2:
return "HIGH"
case len(transitive) > 5 || len(domains) > 1:
return "MEDIUM"
default:
return "LOW"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Risk tier derivation can drift from shard impact risk.

On Line 179, the comment says this matches shard impact thresholds, but Line 187 only folds in transitive importer domains. renderImpactSection also counts domains from direct caller files (function-call edges), so TOUR risk can disagree with shard risk for the same file.

Suggested fix
 func riskFor(file string, cache *Cache) string {
 	transitive := cache.TransitiveDependents(file)
+	directCallerFiles := make(map[string]bool)
+	for id, fn := range cache.FnByID {
+		if fn.File != file {
+			continue
+		}
+		for _, caller := range cache.Callers[id] {
+			if caller.File != "" && caller.File != file {
+				directCallerFiles[caller.File] = true
+			}
+		}
+	}
+
 	domains := map[string]bool{}
 	if d := cache.FileDomain[file]; d != "" {
 		domains[d] = true
 	}
 	for f := range transitive {
 		if d := cache.FileDomain[f]; d != "" {
 			domains[d] = true
 		}
 	}
+	for f := range directCallerFiles {
+		if d := cache.FileDomain[f]; d != "" {
+			domains[d] = true
+		}
+	}
 	switch {
 	case len(transitive) > 20 || len(domains) > 2:
 		return "HIGH"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// riskFor is a narrow re-derivation of the impact section's risk tier for the
// tour line. It matches renderImpactSection's thresholds so tour and shard stay
// in agreement.
func riskFor(file string, cache *Cache) string {
transitive := cache.TransitiveDependents(file)
domains := map[string]bool{}
if d := cache.FileDomain[file]; d != "" {
domains[d] = true
}
for f := range transitive {
if d := cache.FileDomain[f]; d != "" {
domains[d] = true
}
}
switch {
case len(transitive) > 20 || len(domains) > 2:
return "HIGH"
case len(transitive) > 5 || len(domains) > 1:
return "MEDIUM"
default:
return "LOW"
}
// riskFor is a narrow re-derivation of the impact section's risk tier for the
// tour line. It matches renderImpactSection's thresholds so tour and shard stay
// in agreement.
func riskFor(file string, cache *Cache) string {
transitive := cache.TransitiveDependents(file)
directCallerFiles := make(map[string]bool)
for id, fn := range cache.FnByID {
if fn.File != file {
continue
}
for _, caller := range cache.Callers[id] {
if caller.File != "" && caller.File != file {
directCallerFiles[caller.File] = true
}
}
}
domains := map[string]bool{}
if d := cache.FileDomain[file]; d != "" {
domains[d] = true
}
for f := range transitive {
if d := cache.FileDomain[f]; d != "" {
domains[d] = true
}
}
for f := range directCallerFiles {
if d := cache.FileDomain[f]; d != "" {
domains[d] = true
}
}
switch {
case len(transitive) > 20 || len(domains) > 2:
return "HIGH"
case len(transitive) > 5 || len(domains) > 1:
return "MEDIUM"
default:
return "LOW"
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/tour.go` around lines 178 - 199, riskFor currently only
aggregates domains from cache.TransitiveDependents(file) and the file itself via
cache.FileDomain, which can diverge from renderImpactSection because that
function also includes domains of direct caller files (function-call edges);
update riskFor to also iterate direct callers of the file (i.e., the files that
call this file via function-call edges) and add their cache.FileDomain entries
into the domains set before computing thresholds so the domain count matches
renderImpactSection (reference symbols: riskFor, cache.TransitiveDependents,
cache.FileDomain and the codepath that obtains direct callers such as the
function that returns direct callers/callers for a file in the cache).

… analyze wiring

Completes the linearization layer over sharding. All four strategies,
prose narrative preamble, token-budget chapter chunking, and analyze
integration so one command produces shards + spine.

- internal/shards/tour.go
  - BFSSeedStrategy / DFSSeedStrategy: undirected walk from --seed,
    emit-on-pop so DFS descends one branch fully before the next.
  - CentralityStrategy: transitive-dependent count desc, lex tiebreak.
  - ChunkTour: splits at file-entry boundaries to fit a token budget;
    TOUR.md becomes an index linking to TOUR.01.md..TOUR.NN.md with
    prev/next chapter links.
  - WriteTour now accepts budgetTokens; chapter files written atomically.
  - ResolveStrategy: shared strategy lookup (used by tour + analyze).
- internal/shards/narrative.go
  - RenderNarrative: prose preamble — domain/subdomain, import and
    importer counts with named examples, defined functions + call
    adjacency, risk tier with transitive-dependent count.
- internal/shards/render.go
  - RenderAll / RenderAllThreeFile take a narrate bool; when set, each
    shard is prefixed with the narrative block.
- cmd/tour.go
  - Flags: --strategy, --seed, --narrate, --budget-tokens, --dry-run.
  - --narrate re-renders existing shards using the cached graph.
- cmd/analyze.go
  - --tour, --tour-strategy, --tour-seed, --tour-budget, --narrate —
    so a single 'supermodel analyze --tour --narrate' emits shards
    + tour in one pass.
- internal/shards/handler.go: wires opts through the cached, fresh, and
  stale-fallback code paths; maybeWriteTour runs after shards succeed.
- internal/shards/daemon.go + tests: updated call sites for new signature.

Tests: 13 new cases covering each strategy (leaves-first, determinism,
cycles, reachable-only, BFS-vs-DFS ordering, centrality, chunk
boundaries, budget fit, strategy resolution, narrative content,
narrate in RenderAll). Full suite green, go vet clean.

Smoke-tested end-to-end on a fabricated graph: all four strategies
produce correct orderings; --budget-tokens 60 chunks 5 files into
5 chapters with index; --narrate prefixes each shard with prose.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/analyze.go (1)

62-76: ⚠️ Potential issue | 🟡 Minor

Shard-dependent flags can silently no-op when shards are disabled in config.

Right now --tour / --narrate can be accepted but ignored if cfg.ShardsEnabled() is false, which is confusing behavior.

Suggested guard
 			if noShards && (narrate || tour) {
 				return fmt.Errorf("--narrate and --tour require shards (cannot combine with --no-shards)")
 			}
+			if !cfg.ShardsEnabled() && (threeFile || narrate || tour) {
+				return fmt.Errorf("--three-file, --narrate, and --tour require shards to be enabled")
+			}
 			dir := "."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/analyze.go` around lines 62 - 76, The CLI accepts shard-dependent flags
(e.g., tour, narrate and the tour* options) but currently they silently no-op
when cfg.ShardsEnabled() is false; add a guard before choosing the generation
path that checks if cfg.ShardsEnabled() is false and any shard-dependent flags
(tour, narrate, tourStrategy, tourSeed, tourBudget) are set, and if so return a
clear error (or non-zero exit) telling the user those flags require shards to be
enabled; locate this check in cmd/analyze.go around the existing
cfg.ShardsEnabled() branch that calls shards.Generate(...) and analyze.Run(...),
and perform the validation there so analyze.Run is not invoked while ignoring
the shard-specific flags.
♻️ Duplicate comments (2)
docs/linearization.md (1)

43-47: ⚠️ Potential issue | 🟡 Minor

Add language tags to fenced code blocks to satisfy markdown lint.

The unlabeled fences trigger MD040; adding explicit languages keeps docs lint-clean.

Suggested patch
-```
+```text
 TOUR.md                         ← linear walk (this feature)
 src/auth/session.go             ← source file
 src/auth/session.graph.go       ← existing shard (per-file linearization)

@@
- +bash
supermodel tour [--strategy topo|bfs-seed|dfs-seed|centrality]
[--seed ]
[--narrate]
[--budget-tokens ]
[--dry-run]
[path]

@@
-```
+```bash
supermodel analyze [--tour]
                   [--tour-strategy topo|bfs-seed|dfs-seed|centrality]
                   [--tour-seed <file>]
                   [--tour-budget <N>]
                   [--narrate]
                   [path]
</details>


Also applies to: 117-124, 128-135

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/linearization.md around lines 43 - 47, The markdown in
docs/linearization.md uses unlabeled fenced code blocks which trigger MD040;
update each fence to include a language tag (e.g., change the file-list block
fence to text and the command examples to bash) so the three shown blocks
(the TOUR.md/source file shard listing and the two supermodel command examples)
and the additional blocks referenced (around the other examples at the same
sections) all have explicit language tags; edit the fences in those code blocks
to match the suggested patch (use text for plain lists and bash for CLI
examples) throughout the file.


</details>

</blockquote></details>
<details>
<summary>internal/shards/tour.go (1)</summary><blockquote>

`21-50`: _⚠️ Potential issue_ | _🟡 Minor_

**Lint is still red in this file.**

The exported `Name`/`Order` methods still need doc comments, and `splitDomain` still returns two anonymous strings. That matches the current `revive`/`gocritic` failures, so CI will keep failing until this is cleaned up.  
  


Also applies to: 137-137, 275-275

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/tour.go` around lines 21 - 50, Exported methods like Name on
TopoStrategy, BFSSeedStrategy, DFSSeedStrategy, CentralityStrategy and all Order
methods must have proper doc comments (start with the method name) and the
splitDomain function must return two named string results instead of anonymous
returns; add comments above each exported Name/Order function using the exact
function names (e.g., "TopoStrategy.Name ...", "BFSSeedStrategy.Order ...") and
change splitDomain's signature to return (domain, suffix string) and update its
return statements and callers to use those named results so revive/gocritic
warnings are resolved.
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @cmd/tour.go:

  • Around line 81-84: When --narrate is set we must not rewrite other shard
    formats; instead, before calling shards.RenderAll from the narrate branch (where
    cache.SourceFiles() is used), filter the files list to only those that currently
    use the .graph shard format (i.e. detect existing .graph.* shards via the
    cache/metadata API), or replace the call with a new helper like
    shards.RenderExistingGraphShards(repoDir, cache, graphFiles, true, dryRun) that
    only rewrites .graph shards; update the narrate branch to use the filtered list
    (or new helper) so RenderAll does not convert repos from .calls/.deps/.impact to
    .graph and avoid stale-file cleanup deleting those other formats.

In @internal/shards/handler.go:

  • Around line 51-67: The function maybeWriteTour currently proceeds to write a
    tour even when a seed (for bfs-seed/dfs-seed) resolves to no files; update
    maybeWriteTour to validate the chosen seed before calling WriteTour by checking
    that the resolved seed files appear in cache.SourceFiles() (or that
    seededTraversal returned a non-empty set) and, if the seed is missing or yields
    no files, log a warning via ui.Warn and return early; keep the existing
    ResolveStrategy and WriteTour calls but add this seed existence check (using
    cache.SourceFiles() and strategy/seed identifiers) to prevent reporting success
    for an absent seed.
  • Around line 72-89: The error messages in ResolveStrategy incorrectly reference
    a non-existent CLI flag (--seed); update the validation here to be flag-agnostic
    by changing the fmt.Errorf messages in ResolveStrategy for the "bfs-seed" and
    "dfs-seed" branches to something like "bfs-seed requires a seed file" and
    "dfs-seed requires a seed file" (or move the flag-specific check into the CLI
    layer that parses --tour-seed), ensuring references to BFSSeedStrategy,
    DFSSeedStrategy and ResolveStrategy remain consistent so maybeWriteTour and the
    analyze flow no longer point users to the wrong flag.

In @internal/shards/narrative.go:

  • Around line 114-124: uniqueCallerNames currently dedupes by function name
    which collapses distinct functions that share the same name; change the dedupe
    key to the unique function identifier instead. In uniqueCallerNames, use
    r.FuncID (or another unique ID from CallerRef) as the seen-map key (e.g.,
    map[FuncID]bool) while still using cache.FuncName(r.FuncID) to produce the human
    name to append; skip when the name is empty or the FuncID was already seen, mark
    the FuncID as seen, and append the name. This preserves distinct functions with
    identical names while keeping return type as []string.

In @internal/shards/tour.go:

  • Around line 462-485: The current generation logic can leave stale TOUR.NN.md
    files when the new chapters slice is smaller; before writing new multi-chapter
    files (i.e. in the branch that iterates over chapters and calls
    writeFile(fmt.Sprintf("TOUR.%02d.md", ...), chapter)), delete any existing
    TOUR..md files that won't be overwritten (or delete all TOUR.NN.md files up
    front) so only the newly written set remains; use a glob for the TOUR.
    .md
    pattern, remove matches that are not in the current expected filenames, and then
    proceed to write the new TOUR.%02d.md files and the index (update both the
    single-chapter and multi-chapter branches to clean up stale TOUR.NN.md files
    appropriately).
  • Around line 215-241: The current loop over entries (built from order using
    splitDomain and written with writeTourEntry) only emits headings when domain
    changes but doesn't regroup files, so domains reappear; fix by collecting files
    into a two-level ordered map: track a slice of seen domains and for each domain
    a slice of seen subdomains, and append each file into domainMap[dom][sub] in the
    same order they appear in order; then iterate the seen domains and their seen
    subdomains (preserving per-group file order) to emit the "## Domain" and
    optional "### Subdomain" headings and call writeTourEntry for each file (use the
    existing variables entries, splitDomain, cache.FileDomain, repoRelPrefix, and b)
    so output is grouped by domain/subdomain but retains intra-group order.
  • Around line 357-417: The parser currently resets currentHeading after adding
    the first domainedEntry, so subsequent "- **" entries lose their domain heading;
    remove the currentHeading.Reset() call in the block parser (the loop that builds
    entries from rest) so that currentHeading persists across multiple entries until
    a new "## " or "### " heading block updates it, ensuring later chapters retain
    the proper heading context for entries (domainedEntry.heading).

Outside diff comments:
In @cmd/analyze.go:

  • Around line 62-76: The CLI accepts shard-dependent flags (e.g., tour, narrate
    and the tour* options) but currently they silently no-op when
    cfg.ShardsEnabled() is false; add a guard before choosing the generation path
    that checks if cfg.ShardsEnabled() is false and any shard-dependent flags (tour,
    narrate, tourStrategy, tourSeed, tourBudget) are set, and if so return a clear
    error (or non-zero exit) telling the user those flags require shards to be
    enabled; locate this check in cmd/analyze.go around the existing
    cfg.ShardsEnabled() branch that calls shards.Generate(...) and analyze.Run(...),
    and perform the validation there so analyze.Run is not invoked while ignoring
    the shard-specific flags.

Duplicate comments:
In @docs/linearization.md:

  • Around line 43-47: The markdown in docs/linearization.md uses unlabeled fenced
    code blocks which trigger MD040; update each fence to include a language tag
    (e.g., change the file-list block fence to ```text and the command examples to
the two supermodel command examples) and the additional blocks referenced
(around the other examples at the same sections) all have explicit language
tags; edit the fences in those code blocks to match the suggested patch (use
text for plain lists and bash for CLI examples) throughout the file.

In `@internal/shards/tour.go`:
- Around line 21-50: Exported methods like Name on TopoStrategy,
BFSSeedStrategy, DFSSeedStrategy, CentralityStrategy and all Order methods must
have proper doc comments (start with the method name) and the splitDomain
function must return two named string results instead of anonymous returns; add
comments above each exported Name/Order function using the exact function names
(e.g., "TopoStrategy.Name ...", "BFSSeedStrategy.Order ...") and change
splitDomain's signature to return (domain, suffix string) and update its return
statements and callers to use those named results so revive/gocritic warnings
are resolved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c2fc073c-bd19-4e5d-ae23-5bc25994bee5

📥 Commits

Reviewing files that changed from the base of the PR and between 2ef0846 and 57ef344.

📒 Files selected for processing (11)
  • cmd/analyze.go
  • cmd/tour.go
  • docs/linearization.md
  • internal/shards/daemon.go
  • internal/shards/handler.go
  • internal/shards/narrative.go
  • internal/shards/render.go
  • internal/shards/render_stale_test.go
  • internal/shards/render_test.go
  • internal/shards/tour.go
  • internal/shards/tour_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/shards/tour_test.go

Comment thread cmd/tour.go
Comment on lines +81 to +84
if narrate {
files := cache.SourceFiles()
written, rerr := shards.RenderAll(repoDir, cache, files, true, dryRun)
if rerr != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

--narrate can destructively switch shard format.

Calling RenderAll here rewrites as .graph.* and triggers stale three-file cleanup, so repos using .calls/.deps/.impact can lose those files during supermodel tour --narrate.

Safer approach (rewrite only existing .graph shards)
 			if narrate {
 				files := cache.SourceFiles()
-				written, rerr := shards.RenderAll(repoDir, cache, files, true, dryRun)
+				var graphFiles []string
+				for _, f := range files {
+					if _, statErr := os.Stat(filepath.Join(repoDir, shards.ShardFilename(f))); statErr == nil {
+						graphFiles = append(graphFiles, f)
+					}
+				}
+				written, rerr := shards.RenderAll(repoDir, cache, graphFiles, true, dryRun)
 				if rerr != nil {
 					return fmt.Errorf("re-rendering shards with narrative: %w", rerr)
 				}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tour.go` around lines 81 - 84, When --narrate is set we must not rewrite
other shard formats; instead, before calling shards.RenderAll from the narrate
branch (where cache.SourceFiles() is used), filter the files list to only those
that currently use the .graph shard format (i.e. detect existing .graph.* shards
via the cache/metadata API), or replace the call with a new helper like
shards.RenderExistingGraphShards(repoDir, cache, graphFiles, true, dryRun) that
only rewrites .graph shards; update the narrate branch to use the filtered list
(or new helper) so RenderAll does not convert repos from .calls/.deps/.impact to
.graph and avoid stale-file cleanup deleting those other formats.

Comment on lines +51 to +67
func maybeWriteTour(repoDir string, cache *Cache, opts GenerateOptions) {
if !opts.Tour {
return
}
strategy, err := ResolveStrategy(opts.TourStrategy, opts.TourSeed)
if err != nil {
ui.Warn("Tour skipped: %v", err)
return
}
out, err := WriteTour(repoDir, cache, strategy, opts.TourBudget, opts.DryRun)
if err != nil {
ui.Warn("Tour write failed: %v", err)
return
}
if !opts.DryRun {
ui.Success("Wrote tour to %s (strategy: %s)", out, strategy.Name())
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't report success for a seed file that isn't in the cache.

For bfs-seed / dfs-seed, a typo in opts.TourSeed currently falls through to seededTraversal, which returns no files, and then maybeWriteTour still logs success. That gives the user an almost-empty TOUR.md instead of a clear warning. Please validate the seed against cache.SourceFiles() before writing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/handler.go` around lines 51 - 67, The function maybeWriteTour
currently proceeds to write a tour even when a seed (for bfs-seed/dfs-seed)
resolves to no files; update maybeWriteTour to validate the chosen seed before
calling WriteTour by checking that the resolved seed files appear in
cache.SourceFiles() (or that seededTraversal returned a non-empty set) and, if
the seed is missing or yields no files, log a warning via ui.Warn and return
early; keep the existing ResolveStrategy and WriteTour calls but add this seed
existence check (using cache.SourceFiles() and strategy/seed identifiers) to
prevent reporting success for an absent seed.

Comment on lines +72 to +89
func ResolveStrategy(name, seed string) (TourStrategy, error) {
switch name {
case "", "topo":
return TopoStrategy{}, nil
case "bfs-seed":
if seed == "" {
return nil, fmt.Errorf("bfs-seed requires --seed <file>")
}
return BFSSeedStrategy{Seed: seed}, nil
case "dfs-seed":
if seed == "" {
return nil, fmt.Errorf("dfs-seed requires --seed <file>")
}
return DFSSeedStrategy{Seed: seed}, nil
case "centrality":
return CentralityStrategy{}, nil
default:
return nil, fmt.Errorf("unknown strategy %q (supported: topo, bfs-seed, dfs-seed, centrality)", name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The seed error message points at the wrong flag for analyze.

This helper says --seed, but the analyze flow in this PR uses --tour-seed. So the warning from maybeWriteTour sends users to a flag that doesn't exist in that command. I'd keep this message flag-agnostic here, or move the flag-specific validation up into the CLI layer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/handler.go` around lines 72 - 89, The error messages in
ResolveStrategy incorrectly reference a non-existent CLI flag (--seed); update
the validation here to be flag-agnostic by changing the fmt.Errorf messages in
ResolveStrategy for the "bfs-seed" and "dfs-seed" branches to something like
"bfs-seed requires a seed file" and "dfs-seed requires a seed file" (or move the
flag-specific check into the CLI layer that parses --tour-seed), ensuring
references to BFSSeedStrategy, DFSSeedStrategy and ResolveStrategy remain
consistent so maybeWriteTour and the analyze flow no longer point users to the
wrong flag.

Comment on lines +114 to +124
func uniqueCallerNames(refs []CallerRef, cache *Cache) []string {
seen := make(map[string]bool, len(refs))
var out []string
for _, r := range refs {
n := cache.FuncName(r.FuncID)
if n == "" || seen[n] {
continue
}
seen[n] = true
out = append(out, n)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Caller/callee dedupe is collapsing distinct functions with the same name.

uniqueCallerNames dedupes on function name, so different functions named the same get merged into one mention. That makes the narrative adjacency lossy.

Suggested fix
 func uniqueCallerNames(refs []CallerRef, cache *Cache) []string {
-	seen := make(map[string]bool, len(refs))
+	seen := make(map[string]bool, len(refs))
 	var out []string
 	for _, r := range refs {
-		n := cache.FuncName(r.FuncID)
-		if n == "" || seen[n] {
+		if r.FuncID == "" || seen[r.FuncID] {
 			continue
 		}
-		seen[n] = true
-		out = append(out, n)
+		n := cache.FuncName(r.FuncID)
+		if n == "" {
+			continue
+		}
+		seen[r.FuncID] = true
+		if r.File != "" {
+			out = append(out, fmt.Sprintf("%s (%s)", n, r.File))
+		} else {
+			out = append(out, n)
+		}
 	}
 	sort.Strings(out)
 	return out
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/narrative.go` around lines 114 - 124, uniqueCallerNames
currently dedupes by function name which collapses distinct functions that share
the same name; change the dedupe key to the unique function identifier instead.
In uniqueCallerNames, use r.FuncID (or another unique ID from CallerRef) as the
seen-map key (e.g., map[FuncID]bool) while still using cache.FuncName(r.FuncID)
to produce the human name to append; skip when the name is empty or the FuncID
was already seen, mark the FuncID as seen, and append the name. This preserves
distinct functions with identical names while keeping return type as []string.

Comment thread internal/shards/tour.go
Comment on lines +215 to +241
// Group by domain then subdomain while preserving order within each group.
type entry struct {
file string
dom string
sub string
}
entries := make([]entry, 0, len(order))
for _, f := range order {
dom, sub := splitDomain(cache.FileDomain[f])
entries = append(entries, entry{file: f, dom: dom, sub: sub})
}

lastDom, lastSub := "", ""
for _, e := range entries {
if e.dom != lastDom {
fmt.Fprintf(&b, "## Domain: %s\n\n", displayOrUnassigned(e.dom))
lastDom = e.dom
lastSub = ""
}
if e.sub != lastSub {
if e.sub != "" {
fmt.Fprintf(&b, "### Subdomain: %s\n\n", e.sub)
}
lastSub = e.sub
}
writeTourEntry(&b, e.file, cache, repoRelPrefix)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This labels domains, but it doesn't really group them.

Right now you keep the raw strategy order and only emit a heading when the domain changes. So an order like api/a.go, cmd/b.go, api/c.go will render Domain: api, then Domain: cmd, then Domain: api again. That misses the PR goal of grouping files by domain/subdomain into a stable reading spine.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/tour.go` around lines 215 - 241, The current loop over
entries (built from order using splitDomain and written with writeTourEntry)
only emits headings when domain changes but doesn't regroup files, so domains
reappear; fix by collecting files into a two-level ordered map: track a slice of
seen domains and for each domain a slice of seen subdomains, and append each
file into domainMap[dom][sub] in the same order they appear in order; then
iterate the seen domains and their seen subdomains (preserving per-group file
order) to emit the "## Domain" and optional "### Subdomain" headings and call
writeTourEntry for each file (use the existing variables entries, splitDomain,
cache.FileDomain, repoRelPrefix, and b) so output is grouped by domain/subdomain
but retains intra-group order.

Comment thread internal/shards/tour.go
Comment on lines +357 to +417
// Split entries on blank line (entries are separated by "\n\n").
type domainedEntry struct {
heading string // most recent "## Domain" or "### Subdomain" heading block
text string // the "- **..." entry
}
var entries []domainedEntry
var currentHeading strings.Builder
for _, block := range strings.Split(rest, "\n\n") {
block = strings.TrimRight(block, "\n")
if block == "" {
continue
}
if strings.HasPrefix(block, "## ") || strings.HasPrefix(block, "### ") {
currentHeading.WriteString(block)
currentHeading.WriteString("\n\n")
continue
}
if strings.HasPrefix(block, "- **") {
entries = append(entries, domainedEntry{heading: currentHeading.String(), text: block})
currentHeading.Reset()
}
}

var chapters [][]domainedEntry
var currentChapter []domainedEntry
currentSize := approxTokens(preamble)
for _, e := range entries {
entrySize := approxTokens(e.heading) + approxTokens(e.text) + 2
if len(currentChapter) > 0 && currentSize+entrySize > budgetTokens {
chapters = append(chapters, currentChapter)
currentChapter = nil
currentSize = approxTokens(preamble)
}
currentChapter = append(currentChapter, e)
currentSize += entrySize
}
if len(currentChapter) > 0 {
chapters = append(chapters, currentChapter)
}

out := make([]string, len(chapters))
total := len(chapters)
for i, chapter := range chapters {
var b strings.Builder
b.WriteString(preamble)
fmt.Fprintf(&b, "> Chapter %d of %d", i+1, total)
if i > 0 {
fmt.Fprintf(&b, " · [prev](TOUR.%02d.md)", i)
}
if i < total-1 {
fmt.Fprintf(&b, " · [next](TOUR.%02d.md)", i+2)
}
b.WriteString("\n\n")
lastHeading := ""
for _, e := range chapter {
if e.heading != "" && e.heading != lastHeading {
b.WriteString(e.heading)
lastHeading = e.heading
}
b.WriteString(e.text)
b.WriteString("\n\n")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Chunking can drop the heading at the start of later chapters.

Only the first entry after a ## Domain / ### Subdomain block keeps that heading. If a chapter split happens in the middle of that section, the next chapter can start with a bare - **file** entry and no domain context.

Suggested fix
 	var entries []domainedEntry
 	var currentHeading strings.Builder
+	lastHeading := ""
 	for _, block := range strings.Split(rest, "\n\n") {
 		block = strings.TrimRight(block, "\n")
 		if block == "" {
 			continue
 		}
@@
 		}
 		if strings.HasPrefix(block, "- **") {
-			entries = append(entries, domainedEntry{heading: currentHeading.String(), text: block})
-			currentHeading.Reset()
+			heading := currentHeading.String()
+			if heading == "" {
+				heading = lastHeading
+			} else {
+				lastHeading = heading
+				currentHeading.Reset()
+			}
+			entries = append(entries, domainedEntry{heading: heading, text: block})
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/tour.go` around lines 357 - 417, The parser currently resets
currentHeading after adding the first domainedEntry, so subsequent "- **"
entries lose their domain heading; remove the currentHeading.Reset() call in the
block parser (the loop that builds entries from rest) so that currentHeading
persists across multiple entries until a new "## " or "### " heading block
updates it, ensuring later chapters retain the proper heading context for
entries (domainedEntry.heading).

Comment thread internal/shards/tour.go
Comment on lines +462 to +485
if len(chapters) == 1 {
if err := writeFile("TOUR.md", chapters[0]); err != nil {
return "", fmt.Errorf("write tour: %w", err)
}
return outPath, nil
}

// Multi-chapter: write TOUR.NN.md files and an index TOUR.md.
for i, chapter := range chapters {
if err := writeFile(fmt.Sprintf("TOUR.%02d.md", i+1), chapter); err != nil {
return "", fmt.Errorf("write chapter: %w", err)
}
}
var idx strings.Builder
fmt.Fprintf(&idx, "# Repository Tour — Index\n\n")
fmt.Fprintf(&idx, "**Strategy:** `%s` · %d chapters\n\n", strategy.Name(), len(chapters))
fmt.Fprintf(&idx, "Read chapters in order; each fits within the token budget.\n\n")
for i := range chapters {
fmt.Fprintf(&idx, "- [Chapter %d](TOUR.%02d.md)\n", i+1, i+1)
}
if err := writeFile("TOUR.md", idx.String()); err != nil {
return "", fmt.Errorf("write tour index: %w", err)
}
return outPath, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Re-runs can leave stale TOUR.NN.md files behind.

If a repo used to produce, say, 5 chapters and now produces 1 or 2, this code only writes the new set. The old chapter files stay in .supermodel/, which is easy for tools or humans to pick up by mistake when they glob TOUR*.md.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/shards/tour.go` around lines 462 - 485, The current generation logic
can leave stale TOUR.NN.md files when the new chapters slice is smaller; before
writing new multi-chapter files (i.e. in the branch that iterates over chapters
and calls writeFile(fmt.Sprintf("TOUR.%02d.md", ...), chapter)), delete any
existing TOUR.*.md files that won't be overwritten (or delete all TOUR.NN.md
files up front) so only the newly written set remains; use a glob for the
TOUR.*.md pattern, remove matches that are not in the current expected
filenames, and then proceed to write the new TOUR.%02d.md files and the index
(update both the single-chapter and multi-chapter branches to clean up stale
TOUR.NN.md files appropriately).

@jonathanpopham
Copy link
Copy Markdown
Contributor Author

Closing as not planned — benchmark showed tour+narrate was 2.96× more expensive and 1.35× slower than main shards on the Django change_tracking task. Reboot in a new PR with a sharper interpretation of linearization: render the shard relationships themselves as human-language sentences (no tour, no preamble headers). See upcoming follow-up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant