semdex

A semantic project indexer for Claude that builds a searchable, on-disk database of your project's files using embeddings and vector search.

Overview

semdex creates a local RAG (Retrieval-Augmented Generation) database for your project. When Claude works in your project, it can quickly search this semantic index to find relevant files, making it smarter about understanding your codebase without relying solely on grep or find commands.

How It Works

Build Index: Run semdex init from your project root to scan and index all project files
Semantic Search: The indexer creates embeddings for each file using a local ONNX model (no API keys needed)
Local Storage: The vector database is stored in .claude/semdex/ (git-ignored) within your project
Claude Integration: An MCP server exposes search, related, and summary tools that Claude can call directly

Features

Command-line tool: Simple semdex command to build/rebuild your project index
Semantic search: Finds files based on meaning, not just keyword matching
Local embeddings: Uses fastembed with ONNX Runtime -- no PyTorch, no API keys
LanceDB vector store: Embedded vector database, zero config
MCP server: Claude Code calls semdex tools directly via the Model Context Protocol
Smart chunking: Tree-sitter-based splitting for large files (Python, JS, TS)
Auto re-index: Git post-commit hook keeps the index fresh
Git-ignored: Index is stored in .claude/ and won't clutter your repository

Installation

# Install globally (recommended)
pipx install semdex

# Or with pip
pip install semdex

For development:

git clone <repo-url>
cd semdex
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

Quick Start

# 1. Initialize semdex in your project
cd ~/your-project
semdex init  # Builds index, installs git hook, and installs Claude skill

# 2. Register the MCP server with Claude Code
claude mcp add semdex -- semdex serve

# 3. Verify Claude can see it
claude mcp list

# 4. Start a Claude Code session -- Claude now has search, related, and summary tools!
# The semdex skill teaches Claude to use these tools proactively when exploring code

CLI Commands

semdex init                  # Initialize: index project, install git hook and skill, print setup instructions
semdex index                 # Smart re-index (skip unchanged files, prune deleted files)
semdex index --force         # Full re-index (delete and rebuild entire index)
semdex index <dir>           # Index an external directory
semdex index <file>          # Re-index a specific file
semdex index --force <file>  # Force re-index a specific file (bypass mtime check)
semdex search <query>        # Search the index from the command line
semdex status                # Show index stats (file count, last indexed, size)
semdex forget <path>         # Remove a path from the index
semdex hook install          # Install the git post-commit hook
semdex hook uninstall        # Remove the git post-commit hook
semdex serve                 # Start the MCP server (called by Claude Code)

All commands accept --project-root-dir <path> to target a specific project.

Smart Indexing

By default, semdex index uses smart incremental indexing:

Skips unchanged files: Files are skipped if their modification time hasn't changed since last index
Automatically resumes: If indexing is interrupted, re-running will continue from where it left off
Prunes deleted files: Files removed from the project are automatically removed from the index
Force rebuild: Use --force to delete and rebuild the entire index from scratch

Integration with Claude Code

Once the MCP server is registered and the skill is installed, Claude has access to three tools:

search: Semantic search across your codebase. Claude can find files by meaning, not just keywords.
related: Find files related to a given file. Useful when Claude is editing code and needs to find tests, models, or connected modules.
summary: Get index metadata for a file (chunk count, types, last indexed).

The semdex skill teaches Claude to use these tools proactively when exploring unfamiliar code, investigating bugs, understanding architecture, and finding related files. The skill is automatically installed by semdex init and guides Claude on when and how to use semantic search effectively.

Performance

Semdex uses parallel processing to index large repositories quickly:

Small repos (< 100 files): Sequential mode, completes in seconds
Medium repos (1,000-5,000 files): 10 workers, 1-2 minutes
Large repos (20,000+ files): 10 workers, 6-8 minutes

On a 12-core system, expect 8-10x speedup vs sequential processing.

Tuning Performance

Configure in .claude/semdex/config.json:

{
  "parallel_enabled": true,
  "parallel_workers": 8,
  "write_batch_size": 1000,
  "min_files_for_parallel": 50
}

Configuration options:

parallel_enabled: Enable/disable parallel processing (default: true)
parallel_workers: Number of worker processes. 0 = auto-detect (cpu_count - 1). Default: 0
write_batch_size: Files to buffer before writing to database. Larger = faster but more memory. Default: 500
min_files_for_parallel: Minimum files to trigger parallel mode. Below this uses sequential. Default: 50

Troubleshooting

"Indexing is slow":

Verify parallel_enabled is true in config
Check system has multiple CPU cores available
Ensure system has adequate RAM (16+ GB recommended)

"Running out of memory":

Reduce write_batch_size to 250 or 100
Reduce parallel_workers to 4 or 6
Close memory-intensive applications

"System becomes unresponsive":

Reduce parallel_workers to leave more CPU headroom
Check system cooling (CPU throttling can slow things down)

What Gets Indexed

By default, semdex indexes:

Source code files (.js, .ts, .py, .java, etc.)
Documentation (README, docs/*.md)
Configuration files (package.json, tsconfig.json, etc.)
Comments and docstrings within code

It excludes:

node_modules/, .git/, dist/, build/, coverage/, __pycache__/, .venv/
Binary files (images, archives, compiled files)
Files over 1MB
Patterns in your .gitignore

Storage

The index is stored in:

.claude/
└── semdex/
    ├── lance.db/     # LanceDB vector database
    └── config.json   # semdex configuration

semdex init automatically adds .claude/ to your .gitignore.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
docs		docs
semdex-skill		semdex-skill
src/semdex		src/semdex
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

semdex

Overview

How It Works

Features

Installation

Quick Start

CLI Commands

Smart Indexing

Integration with Claude Code

Performance

Tuning Performance

Troubleshooting

What Gets Indexed

Storage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

semdex

Overview

How It Works

Features

Installation

Quick Start

CLI Commands

Smart Indexing

Integration with Claude Code

Performance

Tuning Performance

Troubleshooting

What Gets Indexed

Storage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages