Repo intelligence engine for AI agent workflows. Single binary, one queryable artifact, many consumers.
AI agents (exploration, planning, review, drift-detect, sync-docs, perf) make decisions about code without understanding the repository as a whole. They grep, they read files, they guess. They don't know:
- Which files always change together (coupling)
- Who actually owns which areas and whether that knowledge is still fresh
- Where bugs cluster and why
- What the project's norms are (commit style, test patterns, naming)
- Which docs reference symbols that no longer exist
- Where a newcomer should start contributing
- What the codebase's symbol graph looks like (exports, imports, dependencies)
Each agent reinvents partial answers to these questions using ad-hoc git commands and file scanning. The results are slow, inconsistent, and incomplete.
A single Rust binary (agent-analyzer) that produces a unified repo intelligence artifact (repo-intel.json). This artifact is:
- Generated once, updated incrementally
- Queryable via CLI subcommands (agents call queries, not raw git)
- Consumed by many plugins - each plugin asks the questions it needs
- Cross-platform and deterministic - same binary, same output everywhere
The binary replaces ad-hoc git commands, ast-grep subprocesses, and manual file scanning across the ecosystem.
agent-analyzer (single binary)
├── repo-intel init <path> → full scan, produces repo-intel.json
├── repo-intel update <path> → incremental update (git delta + AST rescan of changed files)
├── repo-intel status <path> → staleness check
└── repo-intel query <type> → query the cached artifact
├── hotspots → most-changed files, weighted by recency
├── bugspots → files with highest bug-fix density
├── painspots → hotspot × complexity × bug-density intersection
├── coupling <file> → files that co-change with target
├── ownership <path> → who owns this area, recency of their activity
├── bus-factor → minimum people covering 80% of knowledge
├── norms → project conventions (commits, naming, test patterns)
├── symbols <file> → exports, imports, definitions for a file
├── dependents <symbol> → who imports/uses this symbol
├── stale-docs → docs referencing changed/deleted/renamed symbols
├── areas → codebase areas with ownership, health, complexity
├── onboard → newcomer-oriented summary of the repo
└── can-i-help → where an outsider can contribute
The artifact is a single JSON file containing all extracted intelligence. Sections are populated by different analysis passes but stored together for query efficiency.
{
"version": "2.0",
"generated": "ISO-8601",
"updated": "ISO-8601",
// ─── Git History ───────────────────────────────────
"git": {
"analyzedUpTo": "SHA",
"totalCommits": N,
"firstCommitDate": "ISO-8601",
"lastCommitDate": "ISO-8601",
"shallow": false
},
"contributors": {
"humans": {
"Alice": {
"commits": N,
"firstSeen": "ISO-8601",
"lastSeen": "ISO-8601",
"areas": ["src/core/", "tests/"],
"recentActivity": N // commits in last 90 days
}
},
"bots": {
"dependabot[bot]": { "commits": N, "tool": "dependabot" }
}
},
"fileActivity": {
"src/core.rs": {
"changes": N,
"authors": ["Alice", "Bob"],
"created": "ISO-8601",
"lastChanged": "ISO-8601",
"additions": N,
"deletions": N,
"bugFixChanges": N, // commits with "fix" prefix or bug-related keywords
"refactorChanges": N // commits with "refactor" prefix
}
},
"coupling": {
"src/core.rs": {
"src/core_test.rs": { "cochanges": N, "strength": 0.76 }
}
},
"conventions": {
"commitStyle": "conventional", // conventional | mixed | freeform
"prefixes": { "feat": N, "fix": N, "refactor": N },
"usesScopes": true,
"namingPatterns": { // detected from AST
"functions": "snake_case",
"types": "PascalCase",
"constants": "SCREAMING_SNAKE"
},
"testPatterns": {
"framework": "cargo-test", // jest | pytest | cargo-test | go-test | ...
"location": "inline", // inline | __tests__ | tests/ | spec/
"naming": "test_*" // test_* | *_test | *Test | *Spec
}
},
"releases": {
"tags": [{ "tag": "v1.0", "date": "ISO-8601", "commitsSince": N }],
"cadence": "weekly"
},
"renames": [{ "from": "old.rs", "to": "new.rs", "date": "ISO-8601" }],
"deletions": [{ "path": "removed.rs", "date": "ISO-8601" }],
// ─── AST Symbols ──────────────────────────────────
"symbols": {
"src/core.rs": {
"exports": [
{ "name": "validate", "kind": "function", "line": 42 },
{ "name": "Config", "kind": "struct", "line": 10 }
],
"imports": [
{ "from": "src/types.rs", "names": ["GitInfo", "FileActivity"] }
],
"definitions": [
{ "name": "validate", "kind": "function", "line": 42, "complexity": 8 },
{ "name": "Config", "kind": "struct", "line": 10, "complexity": 1 }
]
}
},
"importGraph": {
"src/core.rs": ["src/types.rs", "src/utils.rs"],
"src/main.rs": ["src/core.rs", "src/config.rs"]
},
// ─── Doc-Code Cross-References ─────────────────────
"docRefs": {
"README.md": {
"codeRefs": [
{ "text": "validate()", "symbol": "validate", "file": "src/core.rs", "exists": true },
{ "text": "old_function()", "symbol": "old_function", "file": null, "exists": false }
],
"lastUpdated": "ISO-8601",
"referencesHotFiles": true // references files in top 10% hotspots
}
},
// ─── Computed Areas ────────────────────────────────
"areas": {
"src/core/": {
"files": N,
"symbols": N,
"owners": [
{ "name": "Alice", "pct": 72.0, "lastActive": "ISO-8601", "stale": false }
],
"hotspotScore": 0.85, // 0-1, normalized
"bugFixRate": 0.12, // fraction of changes that are fixes
"complexity": {
"total": N,
"median": N,
"max": { "file": "core.rs", "function": "validate", "value": 24 }
},
"testCoverage": {
"hasTests": true,
"testFiles": ["tests/core_test.rs"],
"testToCodeRatio": 0.8 // test lines / code lines
},
"health": "healthy" // healthy | needs-attention | at-risk
}
}
}
- AI attribution counts - current detection is unreliable (<15% accuracy on AI-heavy repos). Not included until code stylometry is viable.
- Commit shape / size distribution - trivially available from
git log, not worth storing. - Commit message samples - no query needs them.
- Directory activity - subsumed by
areaswhich adds ownership, complexity, health.
Each query reads the cached repo-intel.json and returns structured JSON to stdout. Queries are pure functions over the artifact - no git commands, no file I/O beyond reading the cache.
Most-changed files weighted by recency (recent changes count more).
agent-analyzer repo-intel query hotspots [--months=N] [--top=N]
Returns:
[
{ "path": "src/core.rs", "score": 0.92, "changes": 45, "recentChanges": 12,
"bugFixes": 8, "owners": ["Alice"], "coupledWith": ["src/core_test.rs"] }
]Consumers: perf (optimization targets), audit-project (review priority), drift-detect (churn detection)
Files with highest bug-fix density (ratio of fix commits to total commits).
agent-analyzer repo-intel query bugspots [--top=N]
Returns:
[
{ "path": "src/parser.rs", "bugFixRate": 0.35, "totalChanges": 20,
"bugFixes": 7, "lastBugFix": "ISO-8601" }
]Consumers: audit-project (extra scrutiny), planning-agent (risk assessment), onboard (warn newcomers)
Intersection of hotspot + high complexity + high bug density. These are the files that hurt the most.
agent-analyzer repo-intel query painspots [--top=N]
Returns:
[
{ "path": "src/parser.rs", "painScore": 0.94,
"hotspotScore": 0.85, "complexity": 24, "bugFixRate": 0.35,
"owners": ["Alice"], "ownerStale": false }
]Consumers: perf (where to invest effort), planning-agent (risk), onboard (avoid these areas)
Files that co-change with a given file. Reveals hidden dependencies.
agent-analyzer repo-intel query coupling <file> [--min-strength=0.3]
Returns:
[
{ "path": "src/core_test.rs", "cochanges": 16, "strength": 0.76 },
{ "path": "src/types.rs", "cochanges": 8, "strength": 0.38 }
]Consumers: exploration-agent ("you're touching X, also check Y"), planning-agent (scope), audit-project ("you changed A but not its coupled partner B")
Who owns a file or directory, with recency.
agent-analyzer repo-intel query ownership <path>
Returns:
{
"path": "src/core/",
"owners": [
{ "name": "Alice", "pct": 72.0, "commits": 36, "lastActive": "2026-03-10", "stale": false },
{ "name": "Bob", "pct": 28.0, "commits": 14, "lastActive": "2025-11-02", "stale": true }
],
"busFactorRisk": true
}Consumers: next-task (assign work), audit-project (who should review), can-i-help (mentors)
Minimum contributors covering 80% of commits, with staleness awareness.
agent-analyzer repo-intel query bus-factor
Returns:
{
"busFactor": 1,
"criticalOwners": [
{ "name": "Alice", "coverage": 0.85, "lastActive": "2026-03-10", "stale": false }
],
"atRiskAreas": ["src/core/", "src/parser/"]
}Consumers: onboard (project health), drift-detect (risk), can-i-help (where help is needed most)
Project conventions detected from git history and AST analysis.
agent-analyzer repo-intel query norms
Returns:
{
"commits": {
"style": "conventional",
"prefixes": { "feat": 151, "fix": 186, "refactor": 22 },
"usesScopes": true,
"exampleMessages": ["feat(core): add validation pipeline", "fix: handle empty input"]
},
"code": {
"language": "rust",
"functionNaming": "snake_case",
"typeNaming": "PascalCase",
"testFramework": "cargo-test",
"testLocation": "inline",
"testNaming": "test_*"
}
}Consumers: onboard (learn conventions before contributing), can-i-help (match project style), enhance (validate conformance)
Exports, imports, and definitions for a file.
agent-analyzer repo-intel query symbols <file>
Returns:
{
"path": "src/core.rs",
"exports": [
{ "name": "validate", "kind": "function", "line": 42, "complexity": 8 }
],
"imports": [
{ "from": "src/types.rs", "names": ["GitInfo", "FileActivity"] }
],
"importedBy": ["src/main.rs", "src/cli.rs"]
}Consumers: exploration-agent (understand file), planning-agent (trace dependencies), sync-docs (cross-ref with docs)
Who imports or uses a given symbol.
agent-analyzer repo-intel query dependents <symbol> [--file=<file>]
Returns:
{
"symbol": "validate",
"definedIn": "src/core.rs:42",
"usedBy": [
{ "file": "src/main.rs", "line": 15, "context": "use crate::core::validate" },
{ "file": "tests/integration.rs", "line": 8, "context": "core::validate(input)" }
]
}Consumers: planning-agent (impact analysis - "changing this breaks these callers"), audit-project (blast radius)
Documentation files that reference symbols which have been renamed, deleted, or significantly changed.
agent-analyzer repo-intel query stale-docs
Returns:
[
{ "doc": "README.md", "line": 42,
"reference": "old_function()",
"issue": "symbol-not-found",
"suggestion": "Symbol was deleted in commit abc123 on 2026-03-01" },
{ "doc": "docs/api.md", "line": 18,
"reference": "parse_input()",
"issue": "symbol-renamed",
"renamedTo": "parse_data()",
"suggestion": "Renamed in src/parser.rs" },
{ "doc": "README.md", "line": 7,
"reference": "validate()",
"issue": "references-hotspot",
"suggestion": "This doc references a file that changed 12 times since doc was last updated" }
]Consumers: sync-docs (primary), drift-detect (doc decay detection)
High-level view of codebase areas with ownership, health, and complexity.
agent-analyzer repo-intel query areas [--top=N]
Returns:
[
{ "path": "src/core/", "files": 12, "symbols": 87,
"health": "healthy",
"hotspotScore": 0.85, "bugFixRate": 0.12, "complexity": { "median": 5, "max": 24 },
"owners": [{ "name": "Alice", "pct": 72.0, "stale": false }],
"testCoverage": { "hasTests": true, "ratio": 0.8 } },
{ "path": "src/parser/", "files": 4, "symbols": 23,
"health": "at-risk",
"hotspotScore": 0.92, "bugFixRate": 0.35, "complexity": { "median": 12, "max": 31 },
"owners": [{ "name": "Bob", "pct": 90.0, "stale": true }],
"testCoverage": { "hasTests": false, "ratio": 0.0 } }
]Consumers: onboard (codebase overview), can-i-help (where help needed), drift-detect (area health)
Human-readable summary for someone new to the repo.
agent-analyzer repo-intel query onboard
Returns:
{
"language": "rust",
"framework": null,
"structure": "workspace with 6 crates",
"totalFiles": 87,
"totalSymbols": 432,
"busFactor": 1,
"health": "healthy",
"conventions": { "commitStyle": "conventional", "testFramework": "cargo-test" },
"keyAreas": [
{ "path": "crates/core/", "purpose": "shared types and utilities", "entryPoint": "src/lib.rs" },
{ "path": "crates/cli/", "purpose": "command-line interface", "entryPoint": "src/main.rs" }
],
"painPoints": [
{ "path": "src/parser.rs", "reason": "high complexity (31), frequent bugs, single owner" }
],
"gettingStarted": {
"buildCommand": "cargo build",
"testCommand": "cargo test",
"entryPoints": ["crates/cli/src/main.rs"]
}
}Consumers: onboarding flow, README generation, project health dashboards
Guidance for outside contributors.
agent-analyzer repo-intel query can-i-help
Returns:
{
"conventions": { "commitStyle": "conventional", "branchNaming": "feat/*, fix/*" },
"goodFirstAreas": [
{ "path": "src/utils/", "reason": "low complexity, good test coverage, active owner for review",
"owner": "Alice", "complexity": 3 }
],
"needsHelp": [
{ "path": "src/parser/", "reason": "high bug rate, owner inactive for 4 months",
"bugFixRate": 0.35, "ownerLastActive": "2025-11-02" }
],
"recentActivity": {
"activeContributors": 2,
"commitsLast30Days": 45,
"openIssues": null
}
}Consumers: OSS onboarding, contributor guidance, mentorship matching
How each ecosystem plugin uses the artifact:
| Consumer | Queries Used | Integration Point |
|---|---|---|
| exploration-agent | coupling, symbols, ownership, areas | Before planning - understand blast radius |
| planning-agent | coupling, dependents, painspots, ownership | Scope decisions, risk assessment, reviewer selection |
| audit-project | painspots, coupling, ownership, bugspots | Review priority, "you changed A but not coupled B" |
| drift-detect | areas, bus-factor, stale-docs, hotspots | Area health trends, knowledge loss, doc decay |
| sync-docs | stale-docs, symbols, coupling | Find stale references, prioritize doc updates |
| perf | hotspots, painspots, coupling, symbols | Optimization targets, code path mapping |
| enhance | norms, areas | Convention conformance validation |
| onboard (new) | onboard, norms, areas, bus-factor | Newcomer orientation |
| can-i-help (new) | can-i-help, norms, bugspots, ownership | OSS contributor guidance |
The binary resolver in agent-core/lib/binary/ auto-downloads the binary. Plugins call it via:
const binary = require('@agentsys/lib/binary');
binary.ensureBinarySync();
// Generate or update
const json = binary.runAnalyzer(['repo-intel', 'init', cwd]);
// or
const json = binary.runAnalyzer(['repo-intel', 'update', cwd]);
// Query
const hotspots = JSON.parse(binary.runAnalyzer(['repo-intel', 'query', 'hotspots', '--top=10', cwd]));
const coupling = JSON.parse(binary.runAnalyzer(['repo-intel', 'query', 'coupling', 'src/core.rs', cwd]));What exists: git history extraction, file activity, coupling, basic queries.
Fixes needed before release:
- Date ordering bug (created/lastChanged swap)
- Bot classification (copilot-swe-agent, Copilot, agent-core-bot)
- Drop AI attribution from output (unreliable, re-add when stylometry works)
- Add
bugFixRateandrefactorChangesto file activity - Add
recentActivityto contributors (commits in last 90 days) - Add
ownerStaleflag (no commits in 90+ days) - Add recency weighting to hotspot scores
- Rename CLI from
git-maptorepo-intel(unified command namespace)
Output: git history portion of repo-intel.json
Replace ast-grep subprocess with embedded Rust parser.
Implementation:
- Use
tree-sitterwith language grammars for: Rust, TypeScript/JavaScript, Python, Go, Java - Extract: exports, imports, definitions (functions, classes, structs, traits, interfaces)
- Compute: cyclomatic complexity per function, import graph
- Detect: naming conventions (snake_case, camelCase, PascalCase), test framework and patterns
New queries: symbols, dependents, norms (code section)
Output: symbols, importGraph, conventions.namingPatterns, conventions.testPatterns sections
Cross-reference documentation with code symbols.
Implementation:
- Parse markdown files for code references (backtick spans, code blocks, API mentions)
- Match against symbol table from Phase 2
- Cross-reference with renames and deletions from Phase 1
- Detect staleness: doc references symbol that was renamed/deleted/moved
- Detect hotspot references: doc references a frequently-changed file but hasn't been updated
New queries: stale-docs
Output: docRefs section
Aggregate per-directory intelligence from Phases 1-3.
Implementation:
- Group files by directory prefix
- Compute per-area: ownership (with recency), hotspot score, bug-fix rate, complexity stats, test coverage ratio
- Classify health: healthy (active owner, tested, low bug rate), needs-attention (stale owner OR low tests OR high bugs), at-risk (multiple risk factors)
- Generate
onboardandcan-i-helpfrom area data
New queries: areas, painspots, bugspots, onboard, can-i-help, bus-factor (enhanced with staleness)
Output: areas section
Detect AI-generated code from code patterns, not commit metadata.
This is a research problem, not an engineering task. Current metadata-based detection catches <15% of AI-authored commits. Viable stylometry requires:
- Training data: known AI vs human code samples
- Feature extraction: comment density, naming uniformity, error handling patterns, import ordering
- Per-tool fingerprinting: Claude vs Copilot vs Cursor style differences
- Confidence scoring: per-file, not per-commit
Not scheduled. Revisit when academic research matures or when we accumulate enough labeled training data from agent-sh repos (where ground truth is known).
The artifact supports incremental updates to avoid full rescans:
repo-intel.json contains:
git.analyzedUpTo = SHA of last analyzed commit
updated = timestamp of last update
Update algorithm:
- Check
git.analyzedUpToagainst current HEAD - If equal: artifact is current, no work needed
- If
git cat-file -t analyzedUpTofails: force-push detected, full rebuild - Otherwise: extract delta (
analyzedUpTo..HEAD), merge into existing artifact - For AST: rescan only files changed in the delta (diff --name-only)
- For doc-refs: recheck only docs that reference changed files
Performance target: <2s incremental update for repos with <100 new commits.
agent-analyzer repo-intel init [--max-commits=N] <path>
agent-analyzer repo-intel update <path>
agent-analyzer repo-intel status <path>
agent-analyzer repo-intel query <type> [args...] <path>
# Legacy (kept for backwards compat during migration)
agent-analyzer git-map init|update|status|query ...
All output to stdout as JSON. Progress and errors to stderr.
crates/
analyzer-core/ # Shared: types, git2 wrapper, AI detection, noise filtering, output
analyzer-git-intel/ # Phase 1: git history extraction, aggregation, queries
analyzer-ast-intel/ # Phase 2: tree-sitter parsing, symbol extraction, import graph
analyzer-doc-intel/ # Phase 3: markdown parsing, symbol cross-referencing
analyzer-areas/ # Phase 4: area computation, health classification, onboard/can-i-help
analyzer-cli/ # Unified binary, clap dispatch
- AI attribution counts - unreliable until stylometry works, omit from output
- Commit shape / size distribution - no consumer needs it
- Commit message samples - no consumer needs them
- Release cadence analysis -
git tagis sufficient - GitHub API integration - drift-detect already handles this in JS
- Language server protocol - the binary is a batch tool, not a daemon
- Watch mode - agents call update explicitly when needed
The binary is worth releasing when:
- At least one consumer (exploration-agent or sync-docs) is wired up and using queries in production
- The
areasquery produces actionable health assessments on real repos - The
couplingquery is consumed by exploration-agent before every planning session - The
onboardquery produces a useful newcomer summary that agents can present - The
stale-docsquery finds real stale references that sync-docs acts on
Not a success criterion: "the binary runs and produces JSON." That's table stakes. Value comes from agents making better decisions because of this data.