A semantic project indexer for Claude that builds a searchable, on-disk database of your project's files using embeddings and vector search.
semdex creates a local RAG (Retrieval-Augmented Generation) database for your project. When Claude works in your project, it can quickly search this semantic index to find relevant files, making it smarter about understanding your codebase without relying solely on grep or find commands.
- Build Index: Run
semdex initfrom your project root to scan and index all project files - Semantic Search: The indexer creates embeddings for each file using a local ONNX model (no API keys needed)
- Local Storage: The vector database is stored in
.claude/semdex/(git-ignored) within your project - Claude Integration: An MCP server exposes
search,related, andsummarytools that Claude can call directly
- Command-line tool: Simple
semdexcommand to build/rebuild your project index - Semantic search: Finds files based on meaning, not just keyword matching
- Local embeddings: Uses fastembed with ONNX Runtime -- no PyTorch, no API keys
- LanceDB vector store: Embedded vector database, zero config
- MCP server: Claude Code calls semdex tools directly via the Model Context Protocol
- Smart chunking: Tree-sitter-based splitting for large files (Python, JS, TS)
- Auto re-index: Git post-commit hook keeps the index fresh
- Git-ignored: Index is stored in
.claude/and won't clutter your repository
# Install globally (recommended)
pipx install semdex
# Or with pip
pip install semdexFor development:
git clone <repo-url>
cd semdex
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"# 1. Initialize semdex in your project
cd ~/your-project
semdex init # Builds index, installs git hook, and installs Claude skill
# 2. Register the MCP server with Claude Code
claude mcp add semdex -- semdex serve
# 3. Verify Claude can see it
claude mcp list
# 4. Start a Claude Code session -- Claude now has search, related, and summary tools!
# The semdex skill teaches Claude to use these tools proactively when exploring codesemdex init # Initialize: index project, install git hook and skill, print setup instructions
semdex index # Smart re-index (skip unchanged files, prune deleted files)
semdex index --force # Full re-index (delete and rebuild entire index)
semdex index <dir> # Index an external directory
semdex index <file> # Re-index a specific file
semdex index --force <file> # Force re-index a specific file (bypass mtime check)
semdex search <query> # Search the index from the command line
semdex status # Show index stats (file count, last indexed, size)
semdex forget <path> # Remove a path from the index
semdex hook install # Install the git post-commit hook
semdex hook uninstall # Remove the git post-commit hook
semdex serve # Start the MCP server (called by Claude Code)All commands accept --project-root-dir <path> to target a specific project.
By default, semdex index uses smart incremental indexing:
- Skips unchanged files: Files are skipped if their modification time hasn't changed since last index
- Automatically resumes: If indexing is interrupted, re-running will continue from where it left off
- Prunes deleted files: Files removed from the project are automatically removed from the index
- Force rebuild: Use
--forceto delete and rebuild the entire index from scratch
Once the MCP server is registered and the skill is installed, Claude has access to three tools:
search: Semantic search across your codebase. Claude can find files by meaning, not just keywords.related: Find files related to a given file. Useful when Claude is editing code and needs to find tests, models, or connected modules.summary: Get index metadata for a file (chunk count, types, last indexed).
The semdex skill teaches Claude to use these tools proactively when exploring unfamiliar code, investigating bugs, understanding architecture, and finding related files. The skill is automatically installed by semdex init and guides Claude on when and how to use semantic search effectively.
Semdex uses parallel processing to index large repositories quickly:
- Small repos (< 100 files): Sequential mode, completes in seconds
- Medium repos (1,000-5,000 files): 10 workers, 1-2 minutes
- Large repos (20,000+ files): 10 workers, 6-8 minutes
On a 12-core system, expect 8-10x speedup vs sequential processing.
Configure in .claude/semdex/config.json:
{
"parallel_enabled": true,
"parallel_workers": 8,
"write_batch_size": 1000,
"min_files_for_parallel": 50
}Configuration options:
parallel_enabled: Enable/disable parallel processing (default:true)parallel_workers: Number of worker processes.0= auto-detect (cpu_count - 1). Default:0write_batch_size: Files to buffer before writing to database. Larger = faster but more memory. Default:500min_files_for_parallel: Minimum files to trigger parallel mode. Below this uses sequential. Default:50
"Indexing is slow":
- Verify
parallel_enabledistruein config - Check system has multiple CPU cores available
- Ensure system has adequate RAM (16+ GB recommended)
"Running out of memory":
- Reduce
write_batch_sizeto250or100 - Reduce
parallel_workersto4or6 - Close memory-intensive applications
"System becomes unresponsive":
- Reduce
parallel_workersto leave more CPU headroom - Check system cooling (CPU throttling can slow things down)
By default, semdex indexes:
- Source code files (.js, .ts, .py, .java, etc.)
- Documentation (README, docs/*.md)
- Configuration files (package.json, tsconfig.json, etc.)
- Comments and docstrings within code
It excludes:
node_modules/,.git/,dist/,build/,coverage/,__pycache__/,.venv/- Binary files (images, archives, compiled files)
- Files over 1MB
- Patterns in your
.gitignore
The index is stored in:
.claude/
└── semdex/
├── lance.db/ # LanceDB vector database
└── config.json # semdex configuration
semdex init automatically adds .claude/ to your .gitignore.