CrateDB Explore

This project accompanies the CrateDB Explore: IoT Analytics hands-on demo. That demo walks you through real-time IoT analytics using weather monitoring data — 260k timestamped readings from 80 weather stations across Germany with temperature, humidity, and pressure values. You run hourly aggregations in under a second, execute geographic SQL queries, and connect a live Grafana dashboard, all in about 30 minutes.

The load generators in this repository let you drive that same dataset with a configurable mix of geo-proximity, multi-table join, and full-text search queries over the PostgreSQL wire protocol. Each implementation produces identical workloads and reports latency percentiles via HdrHistogram.

Weather Load Generators

Language	Directory	Driver
Java	`src_weather/main/java/`	JDBC (`postgresql`)
Python	`src_weather/main/python/`	psycopg2
.NET (C#)	`src_weather/main/dotnet/`	Npgsql

Query types

All three implementations expose the same three query types, mixed via TYPE:COUNT arguments at the command line. Each stresses a different side of CrateDB:

WKT — geo-proximity scan. Picks a random geo_point + timestamp from a pre-loaded pool and asks for the min/max temperature within 1° of that point at that moment. Exercises spatial filtering on geo_point. One row out per call. Cheapest of the three; sits at the bottom of the latency chart.
REGION — three-table join. Picks a random federal-state name and returns every sensor inside that polygon at the most recent measurement epoch, with its nearest-town label. Exercises WITHIN(point, polygon) containment, a correlated max(measurement_time) subquery, and a join on geo_location. Almost always the slowest — polygon containment is O(vertices) per candidate point, the subquery scans all of climate_data, and the result set is dozens of rows.
FTS — full-text relevance ranking. Picks a random term (cars, trains, factories, energy) and runs MATCH(economics, ?) against german_regions, returning the top 3 by _score. Exercises the Lucene-backed full-text index. Three rows out. Fast in steady state, occasional tail spikes on cold matches.

See each implementation's Query types section (Java / Python / .NET) for the SQL and language-specific notes.

Latency charts

After each run, every implementation writes a latency_histogram.png to its working directory — a percentile-distribution plot (50%, 90%, 99%, 99.9%, 99.99%) with one line per query type, rendered with the platform's native plotting library. The shape is the same in all three (REGION climbs into a tail plateau, WKT/FTS stay low); only the styling differs.

Java — JFreeChart	Python — matplotlib	.NET — ScottPlot

KNN Search CLI

Interactive search tool for CrateDB's german_regions table. Supports semantic search via OpenAI embeddings + KNN_MATCH, and BM25 fulltext search via MATCH — no OpenAI key needed for fulltext mode.

Language	Directory	Driver
Java	`src_knn_search/main/java/`	JDBC (`postgresql`) + Gson
Python	`src_knn_search/main/python/`	psycopg + OpenAI
.NET (C#)	`src_knn_search/main/dotnet/`	Npgsql

Data and Schema

The sql/ directory contains the DDL and DML needed to set up the demo tables:

File	Description
`german_weather_data_ddl.sql`	`CREATE TABLE` statements for `climate_data`, `german_regions`, and `geo_points`
`german_weather_data_dml.sql`	`COPY FROM` and `INSERT` statements to load reference data

The data/ directory contains the reference datasets:

File	Description
`geo_points.json`	726 weather station locations with nearest-town mappings
`german_regions.json`	16 German states with boundaries, fulltext columns, and embeddings
`export-demo_climate_data_large_v2.json`	Climate measurement readings

Loading the data with `COPY FROM`

The same three datasets are published as newline-delimited JSON in a public S3 bucket, so the quickest way to populate the demo tables is to let CrateDB pull them in directly. Run the DDL first so the tables exist, then:

COPY demo.geo_points
  FROM 'https://guided-path.s3.us-east-1.amazonaws.com/geo_points.json'
  WITH (format = 'json') RETURN SUMMARY;

COPY demo.german_regions
  FROM 'https://guided-path.s3.us-east-1.amazonaws.com/german_regions.json'
  WITH (format = 'json') RETURN SUMMARY;

COPY demo.climate_data
  FROM 'https://guided-path.s3.us-east-1.amazonaws.com/export-demo_climate_data_large_v2.json'
  WITH (format = 'json') RETURN SUMMARY;

Notes:

It runs on the cluster, not your client. CrateDB fetches each URL server-side, so the cluster nodes need outbound network access to S3. The bucket is public, so no credentials are required.
Keys in each JSON object map to table columns. These files line up with the DDL directly: geo_location ([lon, lat]) → GEO_POINT, geo_coords (GeoJSON) → GEO_SHAPE, embedding (1536-element array) → FLOAT_VECTOR, and the ISO-8601 measurement_time string → TIMESTAMP.
RETURN SUMMARY reports per-node success/error counts so you can confirm all rows landed (726 geo points, 16 regions, and the full climate stream).
Reloads are idempotent, not additive. All three tables have primary keys (geo_points on (latitude, longitude), german_regions on region_name, and climate_data on (measurement_time, latitude, longitude) via generated columns — geo_point itself can't be a key). COPY FROM does not upsert, so re-running it on an already-loaded table reports every existing row as a duplicate-key conflict in RETURN SUMMARY (the error_count) and keeps the current row — it won't silently double the data. To refresh a table from scratch, DELETE/DROP it first; to merge updates, use INSERT … ON CONFLICT DO UPDATE instead of COPY FROM.
Run REFRESH TABLE demo.geo_points, demo.german_regions, demo.climate_data; afterwards if you want to query the rows immediately.

This is the database-side counterpart to src_stream_load/, which moves the very same files through Kafka instead: a producer (stream_load_into_kafka.py) streams them from S3 into Kafka as JSON, Avro, or Protobuf, and a consumer (stream_from_kafka_into_crate.py) reads them back out of Kafka and loads them into CrateDB.

MCP Search (Claude + CrateDB)

A minimal Python MCP server that exposes a single query_sql tool over the weather dataset, so an MCP client like Claude can answer questions about the data in plain English. It is built on the official MCP Python SDK (FastMCP) and talks to CrateDB's HTTP _sql endpoint. The one non-trivial rule — using WITHIN to keep "in Germany" queries inside the country's borders — is baked into the server's instructions.

See the MCP Search overview for install, configuration, and how to register it with an assistant. A draft cratedb.com walkthrough lives in GERMAN_WEATHER_MCP.md.

Grafana Dashboard

The grafana/ directory contains a pre-built dashboard for visualizing the weather data:

File	Description
`german_weather_data.json`	Importable Grafana dashboard with geomap, gauge, and time-series panels. Connects to CrateDB via the PostgreSQL datasource plugin.

To use it, add a PostgreSQL datasource in Grafana pointing at your CrateDB cluster, then import the JSON file via Dashboards > Import.

Prerequisites

Network access to your CrateDB cluster on port 5432
The tables above populated in a demo schema (run the DDL then DML scripts)

See each implementation's README for language-specific setup and usage instructions.

License

Apache License 2.0. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
data		data
doc		doc
grafana		grafana
sql		sql
src_knn_search/main		src_knn_search/main
src_mcp_search		src_mcp_search
src_stream_load		src_stream_load
src_weather/main		src_weather/main
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrateDB Explore

Weather Load Generators

Query types

Latency charts

KNN Search CLI

Data and Schema

Loading the data with `COPY FROM`

MCP Search (Claude + CrateDB)

Grafana Dashboard

Prerequisites

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CrateDB Explore

Weather Load Generators

Query types

Latency charts

KNN Search CLI

Data and Schema

Loading the data with COPY FROM

MCP Search (Claude + CrateDB)

Grafana Dashboard

Prerequisites

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Loading the data with `COPY FROM`

Packages