Local document workspace for AI agents - A powerful, open-source alternative to macOS Folio with cross-platform support and advanced AI capabilities.
- Linux-first design (works on macOS/Windows too!)
- No proprietary dependencies - runs anywhere Python does
- Lightweight & fast - no bloated Electron apps
- Multi-format support: PDF (text & OCR), Markdown, HTML, plain text
- OCR integration: Automatic text extraction from scanned/image PDFs using Tesseract
- Smart ingestion: Incremental updates, duplicate handling, metadata extraction
- Real-time processing: Progress bars and status updates
- LLM integration: Minimax models via OpenRouter (fast, reliable, affordable)
- Dynamic columns: Add AI-generated insights on-the-fly
- Flexible prompting: Custom templates with
{content},{filename}variables - Batch processing: Analyze entire document collections at once
- Regeneration: Easy re-run of AI analysis for any column
- DuckDB-powered: Lightning-fast analytical database
- ACID compliance: Reliable transactions and data integrity
- SQL interface: Direct access to your document data
- Embeddable: Zero-configuration, file-based storage
- Performance: Handles thousands of documents with sub-second queries
- Gradio interface: Beautiful, responsive web UI
- Real-time updates: Live progress and results
- Intuitive workflow: Drag-and-drop simplicity
- System controls: Built-in shutdown, regeneration, and management tools
| Feature | usefolio/folio (macOS .dmg) | Folio Linux (Cross-platform) |
|---|---|---|
| Platform | macOS only | Linux/macOS/Windows |
| Distribution | Proprietary .dmg installer | Open source, pip/uv installable |
| Database | Unknown/Proprietary | DuckDB (open source, embeddable) |
| AI Integration | Limited/Proprietary | OpenRouter API (multiple models) |
| OCR Support | Unknown | Built-in Tesseract OCR |
| Web UI | Desktop app only | Modern Gradio web interface |
| Cost | Paid software | Free & open source |
| Extensibility | Limited | Fully programmable Python API |
| Data Export | Unknown | Direct SQL access, CSV, JSON |
| Document Types | Limited formats | PDF, Markdown, HTML, Text + OCR |
Folio Linux is a complete rewrite that addresses the limitations of the original macOS-only Folio while adding powerful new capabilities like OCR, web UI, and cross-platform support.
DuckDB is the fastest analytical database for your document workspace:
- ** Blazing Performance**: Sub-second queries on millions of documents
- ** Zero Config**: Single file database, no server setup
- ** SQL Power**: Full SQL support for complex document analysis
- ** Analytics**: Built-in statistical functions and aggregations
- ** Python Native**: Seamless integration with your AI workflows
- ** Scalable**: Handles growing document collections effortlessly
"DuckDB is the SQLite for analytics" - and Folio Linux proves it!
- Python 3.10+
- uv package manager (recommended) or pip
- Tesseract OCR (optional, for PDF OCR):
- Ubuntu/Debian:
sudo apt-get install tesseract-ocr - macOS:
brew install tesseract - Windows: Download from GitHub releases
- Ubuntu/Debian:
# Clone the repository
git clone https://github.com/ubuntupunk/folio-linux.git
cd folio-linux
# Install with uv (recommended)
uv sync
# Or with pip
pip install -r requirements.txt# Using uv
uv run gradio_app.py
# Or directly
python gradio_app.pyOpen your browser to http://localhost:7860
- ** Ingest Documents**: Select a folder containing your PDFs, Markdown, and text files
- ** Add AI Columns**: Create new columns with custom AI analysis
- ** Regenerate**: Re-run analysis for any column as needed
- ** Query & Export**: Access your data via SQL or export results
# Ingest documents programmatically
python -c "
from folio.core import init_db, ingest_folder
con = init_db()
ingest_folder(con, 'path/to/documents')
"
# Add AI columns via code
python -c "
from folio.core import init_db
from folio.llm import add_column
con = init_db()
add_column(con, 'summary', 'Summarize: {content}', 0.3)
"Create a .env file with your OpenRouter API key:
# Get your key from https://openrouter.ai/
echo "OPENROUTER_API_KEY=sk-or-your-key-here" > .envπ folio-linux/
βββ π gradio_app.py # Web interface & UI
βββ π main.py # CLI entry point
βββ π folio/ # Core modules
β βββ __init__.py
β βββ core.py # Document ingestion & DuckDB
β βββ llm.py # AI/LLM integration
β βββ utils.py # Text processing utilities
βββ π docs/ # Sample documents
βββ π pyproject.toml # Project configuration
βββ ποΈ data/folio.db # DuckDB database
- Document Ingestion: Multi-format parsing with OCR fallback
- Database Layer: DuckDB for storage and querying
- AI Integration: OpenRouter API with Minimax models
- Web UI: Gradio-based interface with real-time updates
- Text Processing: Clean text extraction and normalization
from folio.core import init_db, ingest_folder
from folio.llm import add_column
# Initialize database
con = init_db("data/folio.db")
# Ingest documents
ingest_folder(con, "path/to/documents")
# Add AI column
add_column(con, "column_name", "Prompt: {content}", temperature=0.3)CREATE TABLE documents (
id UUID PRIMARY KEY,
filename TEXT,
file_path TEXT UNIQUE,
file_type TEXT,
file_size INTEGER,
imported_at TIMESTAMP,
content TEXT,
-- Dynamic AI columns added here
summary TEXT,
analysis TEXT,
-- ... etc
);We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
# Install dev dependencies
uv sync --group dev
# Run tests
uv run pytest
# Format code
uv run black .
uv run ruff check . --fixMIT License - see LICENSE file for details.
- DuckDB: The fastest analytical database that makes this possible
- Gradio: Beautiful web interfaces with minimal code
- OpenRouter: Reliable AI model access
- Tesseract: World's best OCR engine
- usefolio/folio: Inspiration for the document workspace concept
Folio Linux - Because your documents deserve better than platform lock-in!