RapidCSV

Fast CSV processing for data scientists who have real work to do.

Features

✅ Working Features:

Auto-detect encoding (UTF-8, Latin-1, Windows-1252)
Auto-detect delimiter (comma, tab, semicolon, pipe)
Read compressed files (.gz)
Handle files with BOM, mixed quotes, ragged lines
One-command data cleaning with quick_clean()
Export to Parquet or CSV
2-5x faster than pandas for files over 100K rows
Memory-safe preview with peek()
File analysis with info()

⚠️ Current Limitations:

No full streaming for files larger than memory (only peek)
Limited to Polars-supported data types
No Excel support
No remote file support (S3, HTTP)
Estimated 30% of planned features implemented

Installation

# From source (PyPI package coming soon)
git clone https://github.com/yourusername/rapidcsv
cd rapidcsv
pip install -e .

Quick Start

import rapidcsv as rc

# Just works - auto-detects encoding and delimiter
df = rc.read("messy_data.csv")

# Clean common issues with one command
df_clean = df.quick_clean()

# Save for next time
df_clean.to_parquet("clean_data.parquet")

Core Functions

Reading Files

# Auto-detect everything
df = rc.read("data.csv")

# Or specify explicitly
df = rc.read("data.csv", 
             encoding="latin-1",
             separator="\t",
             error_bad_lines="skip")

# Preview large files
preview = rc.peek("huge_file.csv", rows=1000)

# Get file info without loading
info = rc.info("huge_file.csv")
print(f"Size: {info['size_mb']:.1f} MB")
print(f"Encoding: {info['detected_encoding']}")
print(f"Delimiter: {info['detected_delimiter']}")

Data Cleaning

quick_clean() performs these operations:

Strip whitespace from all strings
Standardize null values (NA, null, None, -, "" → None)
Drop completely empty rows
Remove duplicate rows
Clean column names (spaces → underscores, lowercase)

# All cleaning operations
df_clean = df.quick_clean()

# Or selectively
df_clean = df.quick_clean(
    strip_whitespace=True,
    standardize_nulls=True,
    drop_empty_rows=False,
    drop_duplicate_rows=True,
    fix_column_names=True,
    report=True  # Print what was cleaned
)

Performance

Benchmarked on MacBook Pro M1 with messy CSV files:

File Size	Rows	Pandas	RapidCSV	Speedup
3 MB	10K	0.04s	0.12s	0.3x*
30 MB	100K	0.31s	0.11s	2.9x
148 MB	500K	1.68s	0.33s	5.1x

*Small files have overhead from auto-detection

Times include both loading and cleaning operations.

Error Handling

RapidCSV provides helpful error messages:

# Instead of: "UnicodeDecodeError: 'utf-8' codec can't decode byte..."
# You get: "Failed to decode file 'data.csv' with detected encoding 'utf-8' (confidence: 73.2%). 
#           Try specifying encoding explicitly: rc.read('data.csv', encoding='latin-1')"

Examples

See the examples/ directory:

basic_usage.py - Complete workflow example
performance_comparison.py - Benchmark vs pandas

Development Status

This is alpha software (v0.1.0). What works:

✅ Core CSV reading with auto-detection
✅ Basic data cleaning operations
✅ Parquet/CSV export
✅ Compressed file support
✅ Basic error handling

What's missing:

❌ Full streaming for huge files
❌ Advanced cleaning options
❌ Remote file support
❌ Data profiling reports
❌ CLI tool

Contributing

This project is under active development. Issues and PRs welcome!

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs		docs
examples		examples
src/rapidcsv		src/rapidcsv
test_data		test_data
tests		tests
.gitignore		.gitignore
README.md		README.md
demo_cli.sh		demo_cli.sh
prepare_for_github.sh		prepare_for_github.sh
pyproject.toml		pyproject.toml
setup.py		setup.py
setup.py.template		setup.py.template
setup_dev.sh		setup_dev.sh
test_basic.py		test_basic.py
test_cli.py		test_cli.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RapidCSV

Features

Installation

Quick Start

Core Functions

Reading Files

Data Cleaning

Performance

Error Handling

Examples

Development Status

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RapidCSV

Features

Installation

Quick Start

Core Functions

Reading Files

Data Cleaning

Performance

Error Handling

Examples

Development Status

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages