chdb-core is the foundational engine of the chDB ecosystem โ an in-process SQL OLAP Engine powered by ClickHouse 1
- chDB Ecosystem
- Features
- Architecture
- Installation
- Quick Start
- API Reference
- Demos and Examples
- Benchmark
- Documentation
- Contributing
The chDB project is split into two packages:
| Package | Role | Install |
|---|---|---|
| chdb-core (this repo) | C++ engine + Session / Connection / DB-API interfaces | pip install chdb-core |
| chDB | Pandas-compatible DataStore API built on top of chdb-core | pip install chdb |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ chDB (pip install chdb) โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โ DataStore: pandas-like lazy API โ โ โ โ QueryPlanner / dual-engine exec โ โ โ โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ โ โ โ โ โ โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ โ โ โ chdb-core (pip install chdb-core) โ โ โ โ C++ ClickHouse Engine โ โ โ โ Session / Connection / DB-API โ โ โ โ query() / UDF / Stream โ โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
chdb-core provides everything you need to run SQL queries with ClickHouse performance โ no server required. If you want a higher-level pandas-compatible DataFrame API, install chDB instead.
- In-process SQL OLAP Engine, powered by ClickHouse
- No need to install ClickHouse
- Minimized data copy from C++ to Python with python memoryview
- Input & Output support Parquet, CSV, JSON, Arrow, ORC and 60+ more formats
- Session and Connection management with stateful queries
- Streaming query support for constant-memory processing
- Python DB-API 2.0 compliance
- User Defined Functions (UDF) support
- AI-assisted SQL generation
Currently, chdb-core supports Python 3.9+ on macOS and Linux (x86_64 and ARM64).
pip install chdb-coreimport chdb
result = chdb.query("SELECT version()", "Pretty")
print(result)One-shot Query
The simplest way to run SQL โ no session or connection needed:
import chdb
# Basic query with CSV output (default)
result = chdb.query("SELECT 1, 'hello'")
print(result)
# Pandas DataFrame output
df = chdb.query("SELECT number, number * 2 AS double FROM numbers(10)", "DataFrame")
print(df)
# Parameterized queries
df = chdb.query(
"SELECT toDate({base_date:String}) + number AS date "
"FROM numbers({total_days:UInt64}) "
"LIMIT {items_per_page:UInt64}",
"DataFrame",
params={"base_date": "2025-01-01", "total_days": 10, "items_per_page": 5},
)
print(df)Query on Files (Parquet, CSV, JSON, Arrow, ORC and 60+)
import chdb
res = chdb.query('SELECT * FROM file("data.parquet", Parquet)', "JSON")
print(res)
res = chdb.query('SELECT * FROM file("data.csv", CSV)', "CSV")
print(res)
# Query result statistics
print(f"SQL read {res.rows_read()} rows, {res.bytes_read()} bytes, "
f"storage read {res.storage_rows_read()} rows, {res.storage_bytes_read()} bytes, "
f"elapsed {res.elapsed()} seconds")
# Pandas DataFrame output
chdb.query('SELECT * FROM file("data.parquet", Parquet)', "Dataframe")Connection API
Connection-based API for cursor-style interaction, supporting both in-memory and file-based databases:
import chdb
conn = chdb.connect(":memory:")
cur = conn.cursor()
cur.execute("CREATE TABLE test (id UInt32, name String) ENGINE = Memory")
cur.execute("INSERT INTO test VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie')")
cur.execute("SELECT * FROM test ORDER BY id")
print(cur.fetchone()) # (1, 'Alice')
print(cur.fetchmany(2)) # ((2, 'Bob'), (3, 'Charlie'))
print(cur.column_names()) # ['id', 'name']
print(cur.column_types()) # ['UInt32', 'String']
# Use the cursor as an iterator
cur.execute("SELECT number FROM system.numbers LIMIT 3")
for row in cur:
print(row)
# Always close resources when done
cur.close()
conn.close()For more details, see examples/connect.py.
# File-based persistent database
conn = chdb.connect("mydata.db")
conn.query("CREATE TABLE IF NOT EXISTS logs (ts DateTime, msg String) ENGINE = MergeTree ORDER BY ts")
conn.query("INSERT INTO logs VALUES (now(), 'started')")
result = conn.query("SELECT * FROM logs", "Pretty")
print(result)
conn.close()Stateful Session
Sessions provide a higher-level API with automatic resource management:
from chdb import session as chs
sess = chs.Session()
sess.query("CREATE DATABASE IF NOT EXISTS db_xxx ENGINE = Atomic")
sess.query("CREATE TABLE IF NOT EXISTS db_xxx.log_table (x String, y Int) ENGINE = Log")
sess.query("INSERT INTO db_xxx.log_table VALUES ('a', 1), ('b', 3), ('c', 2), ('d', 5)")
sess.query("CREATE VIEW db_xxx.view_xxx AS SELECT * FROM db_xxx.log_table LIMIT 4")
print(sess.query("SELECT * FROM db_xxx.view_xxx", "Pretty"))see also: test_stateful.py.
Streaming Query
Process large datasets with constant memory usage through chunked streaming:
from chdb import session as chs
sess = chs.Session()
rows_cnt = 0
with sess.send_query("SELECT * FROM numbers(200000)", "CSV") as stream_result:
for chunk in stream_result:
rows_cnt += chunk.rows_read()
print(rows_cnt) # 200000
# Example 2: Manual iteration with fetch()
rows_cnt = 0
stream_result = sess.send_query("SELECT * FROM numbers(200000)", "CSV")
while True:
chunk = stream_result.fetch()
if chunk is None:
break
rows_cnt += chunk.rows_read()
print(rows_cnt) # 200000For more details, see test_streaming_query.py.
Python DB-API 2.0
import chdb.dbapi as dbapi
print("chdb driver version: {0}".format(dbapi.get_client_info()))
conn1 = dbapi.connect()
cur1 = conn1.cursor()
cur1.execute('select version()')
print("description: ", cur1.description)
print("data: ", cur1.fetchone())
cur1.close()
conn1.close()Query on Table (Pandas DataFrame, Parquet file/bytes, Arrow bytes)
import chdb.dataframe as cdf
import pandas as pd
# Join 2 DataFrames
df1 = pd.DataFrame({'a': [1, 2, 3], 'b': ["one", "two", "three"]})
df2 = pd.DataFrame({'c': [1, 2, 3], 'd': ["โ ", "โก", "โข"]})
ret_tbl = cdf.query(sql="select * from __tbl1__ t1 join __tbl2__ t2 on t1.a = t2.c",
tbl1=df1, tbl2=df2)
print(ret_tbl)
# Query on the DataFrame Table
print(ret_tbl.query('select b, sum(a) from __table__ group by b'))
# Pandas DataFrames are automatically registered as temporary tables in ClickHouse
chdb.query("SELECT * FROM Python(df1) t1 JOIN Python(df2) t2 ON t1.a = t2.c").show()Python Table Engine
import chdb
import pandas as pd
df = pd.DataFrame(
{
"a": [1, 2, 3, 4, 5, 6],
"b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
}
)
chdb.query("SELECT b, sum(a) FROM Python(df) GROUP BY b ORDER BY b").show()import chdb
import pyarrow as pa
arrow_table = pa.table(
{
"a": [1, 2, 3, 4, 5, 6],
"b": ["tom", "jerry", "auxten", "tom", "jerry", "auxten"],
}
)
chdb.query("SELECT b, sum(a) FROM Python(arrow_table) GROUP BY b ORDER BY b").show()see also: test_query_py.py.
User Defined Functions (UDF)
from chdb.udf import chdb_udf
from chdb import query
@chdb_udf()
def sum_udf(lhs, rhs):
return int(lhs) + int(rhs)
print(query("SELECT sum_udf(12, 22)"))Some notes on chDB Python UDF(User Defined Function) decorator.
- The function should be stateless. So, only UDFs are supported, not UDAFs(User Defined Aggregation Function).
- Default return type is String. If you want to change the return type, you can pass in the return type as an argument. The return type should be one of the following: https://clickhouse.com/docs/en/sql-reference/data-types
- The function should take in arguments of type String. As the input is TabSeparated, all arguments are strings.
- The function will be called for each line of input. Something like this:
def sum_udf(lhs, rhs): return int(lhs) + int(rhs) for line in sys.stdin: args = line.strip().split('\t') lhs = args[0] rhs = args[1] print(sum_udf(lhs, rhs)) sys.stdout.flush() - The function should be pure python function. You SHOULD import all python modules used IN THE FUNCTION.
def func_use_json(arg): import json ... - Python interpertor used is the same as the one used to run the script. Get from
sys.executable
see also: test_udf.py.
Query Progress
import chdb
# Auto-detect: TTY progress in terminal, progress bar in notebook
conn = chdb.connect(":memory:?progress=auto")
conn.query("SELECT sum(number) FROM numbers_mt(1e10) GROUP BY number % 10 SETTINGS max_threads=4")Progress options: progress=auto | progress=tty | progress=err | progress=off
AI-assisted SQL Generation
import chdb
conn = chdb.connect("file::memory:?ai_provider=openai&ai_model=gpt-4o-mini")
conn.query("CREATE TABLE nums (n UInt32) ENGINE = Memory")
conn.query("INSERT INTO nums VALUES (1), (2), (3)")
sql = conn.generate_sql("Select all rows from nums ordered by n desc")
print(sql) # SELECT * FROM nums ORDER BY n DESC
print(conn.ask("List the numbers table", format="Pretty"))Command Line
python3 -m chdb SQL [OutputFormat]
python3 -m chdb "SELECT 1,'abc'" Pretty- Project Documentation and Usage Examples
- Colab Notebooks and other Script Examples
---
- For chDB specific examples and documentation refer to chDB docs
- For SQL syntax, please refer to ClickHouse SQL Reference
- For pandas-like DataStore API, see chDB
chDB provides an AI Skill that teaches AI coding agents (Cursor, Claude Code, etc.) chDB's multi-source data analytics API. Install it so your AI assistant can write correct chDB code out of the box:
curl -sL https://raw.githubusercontent.com/chdb-io/chdb/main/install_skill.sh | bash- Demo chDB at ClickHouse v23.7 livehouse! and Slides
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Help test and report bugs
- Help improve documentation
- Help improve code quality and performance
We welcome bindings for other languages, please refer to bindings for more details.
Please refer to VERSION-GUIDE.md for more details.
Apache 2.0, see LICENSE for more information.
chDB is mainly based on ClickHouse 1 for trade mark and other reasons, I named it chDB.
- Discord: https://discord.gg/D2Daa2fM5K
- Email: auxten@clickhouse.com
- Twitter: @chdb
Footnotes
-
ClickHouseยฎ is a trademark of ClickHouse Inc. All trademarks, service marks, and logos mentioned or depicted are the property of their respective owners. The use of any third-party trademarks, brand names, product names, and company names does not imply endorsement, affiliation, or association with the respective owners. โฉ โฉ2

