Skip to content

ubuntupunk/folio-linux

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‹ Folio Linux

Local document workspace for AI agents - A powerful, open-source alternative to macOS Folio with cross-platform support and advanced AI capabilities.

GitHub License

made in za

✨ Key Features

Cross-Platform

  • Linux-first design (works on macOS/Windows too!)
  • No proprietary dependencies - runs anywhere Python does
  • Lightweight & fast - no bloated Electron apps

Advanced Document Processing

  • Multi-format support: PDF (text & OCR), Markdown, HTML, plain text
  • OCR integration: Automatic text extraction from scanned/image PDFs using Tesseract
  • Smart ingestion: Incremental updates, duplicate handling, metadata extraction
  • Real-time processing: Progress bars and status updates

AI-Powered Analysis

  • LLM integration: Minimax models via OpenRouter (fast, reliable, affordable)
  • Dynamic columns: Add AI-generated insights on-the-fly
  • Flexible prompting: Custom templates with {content}, {filename} variables
  • Batch processing: Analyze entire document collections at once
  • Regeneration: Easy re-run of AI analysis for any column

Database Excellence

  • DuckDB-powered: Lightning-fast analytical database
  • ACID compliance: Reliable transactions and data integrity
  • SQL interface: Direct access to your document data
  • Embeddable: Zero-configuration, file-based storage
  • Performance: Handles thousands of documents with sub-second queries

Modern Web UI

  • Gradio interface: Beautiful, responsive web UI
  • Real-time updates: Live progress and results
  • Intuitive workflow: Drag-and-drop simplicity
  • System controls: Built-in shutdown, regeneration, and management tools

Comparison to usefolio/folio

Feature usefolio/folio (macOS .dmg) Folio Linux (Cross-platform)
Platform macOS only Linux/macOS/Windows
Distribution Proprietary .dmg installer Open source, pip/uv installable
Database Unknown/Proprietary DuckDB (open source, embeddable)
AI Integration Limited/Proprietary OpenRouter API (multiple models)
OCR Support Unknown Built-in Tesseract OCR
Web UI Desktop app only Modern Gradio web interface
Cost Paid software Free & open source
Extensibility Limited Fully programmable Python API
Data Export Unknown Direct SQL access, CSV, JSON
Document Types Limited formats PDF, Markdown, HTML, Text + OCR

Folio Linux is a complete rewrite that addresses the limitations of the original macOS-only Folio while adding powerful new capabilities like OCR, web UI, and cross-platform support.

πŸ¦† Why DuckDB?

DuckDB is the fastest analytical database for your document workspace:

  • ** Blazing Performance**: Sub-second queries on millions of documents
  • ** Zero Config**: Single file database, no server setup
  • ** SQL Power**: Full SQL support for complex document analysis
  • ** Analytics**: Built-in statistical functions and aggregations
  • ** Python Native**: Seamless integration with your AI workflows
  • ** Scalable**: Handles growing document collections effortlessly

"DuckDB is the SQLite for analytics" - and Folio Linux proves it!

πŸ› οΈ Installation

Prerequisites

  • Python 3.10+
  • uv package manager (recommended) or pip
  • Tesseract OCR (optional, for PDF OCR):
    • Ubuntu/Debian: sudo apt-get install tesseract-ocr
    • macOS: brew install tesseract
    • Windows: Download from GitHub releases

Quick Install

# Clone the repository
git clone https://github.com/ubuntupunk/folio-linux.git
cd folio-linux

# Install with uv (recommended)
uv sync

# Or with pip
pip install -r requirements.txt

πŸš€ Usage

Start the Web Interface

# Using uv
uv run gradio_app.py

# Or directly
python gradio_app.py

Open your browser to http://localhost:7860

Basic Workflow

  1. ** Ingest Documents**: Select a folder containing your PDFs, Markdown, and text files
  2. ** Add AI Columns**: Create new columns with custom AI analysis
  3. ** Regenerate**: Re-run analysis for any column as needed
  4. ** Query & Export**: Access your data via SQL or export results

Command Line Usage

# Ingest documents programmatically
python -c "
from folio.core import init_db, ingest_folder
con = init_db()
ingest_folder(con, 'path/to/documents')
"

# Add AI columns via code
python -c "
from folio.core import init_db
from folio.llm import add_column
con = init_db()
add_column(con, 'summary', 'Summarize: {content}', 0.3)
"

Environment Setup

Create a .env file with your OpenRouter API key:

# Get your key from https://openrouter.ai/
echo "OPENROUTER_API_KEY=sk-or-your-key-here" > .env

Architecture

πŸ“ folio-linux/
β”œβ”€β”€ πŸ“„ gradio_app.py      # Web interface & UI
β”œβ”€β”€ πŸ“„ main.py           # CLI entry point
β”œβ”€β”€ πŸ“ folio/            # Core modules
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ core.py         # Document ingestion & DuckDB
β”‚   β”œβ”€β”€ llm.py          # AI/LLM integration
β”‚   └── utils.py        # Text processing utilities
β”œβ”€β”€ πŸ“ docs/            # Sample documents
β”œβ”€β”€ πŸ“„ pyproject.toml   # Project configuration
└── πŸ—„οΈ data/folio.db    # DuckDB database

Core Components

  • Document Ingestion: Multi-format parsing with OCR fallback
  • Database Layer: DuckDB for storage and querying
  • AI Integration: OpenRouter API with Minimax models
  • Web UI: Gradio-based interface with real-time updates
  • Text Processing: Clean text extraction and normalization

API Reference

Core Functions

from folio.core import init_db, ingest_folder
from folio.llm import add_column

# Initialize database
con = init_db("data/folio.db")

# Ingest documents
ingest_folder(con, "path/to/documents")

# Add AI column
add_column(con, "column_name", "Prompt: {content}", temperature=0.3)

Database Schema

CREATE TABLE documents (
    id UUID PRIMARY KEY,
    filename TEXT,
    file_path TEXT UNIQUE,
    file_type TEXT,
    file_size INTEGER,
    imported_at TIMESTAMP,
    content TEXT,
    -- Dynamic AI columns added here
    summary TEXT,
    analysis TEXT,
    -- ... etc
);

Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

Development Setup

# Install dev dependencies
uv sync --group dev

# Run tests
uv run pytest

# Format code
uv run black .
uv run ruff check . --fix

License

MIT License - see LICENSE file for details.

Acknowledgments

  • DuckDB: The fastest analytical database that makes this possible
  • Gradio: Beautiful web interfaces with minimal code
  • OpenRouter: Reliable AI model access
  • Tesseract: World's best OCR engine
  • usefolio/folio: Inspiration for the document workspace concept

Folio Linux - Because your documents deserve better than platform lock-in!

About

Local document workspace for AI agents - Folio Linux clone with OCR support and Gradio UI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages