This project accompanies the CrateDB Explore: IoT Analytics hands-on demo. That demo walks you through real-time IoT analytics using weather monitoring data — 260k timestamped readings from 80 weather stations across Germany with temperature, humidity, and pressure values. You run hourly aggregations in under a second, execute geographic SQL queries, and connect a live Grafana dashboard, all in about 30 minutes.
The load generators in this repository let you drive that same dataset with a configurable mix of geo-proximity, multi-table join, and full-text search queries over the PostgreSQL wire protocol. Each implementation produces identical workloads and reports latency percentiles via HdrHistogram.
| Language | Directory | Driver |
|---|---|---|
| Java | src_weather/main/java/ |
JDBC (postgresql) |
| Python | src_weather/main/python/ |
psycopg2 |
| .NET (C#) | src_weather/main/dotnet/ |
Npgsql |
All three implementations expose the same three query types, mixed via TYPE:COUNT arguments at the command line. Each stresses a different side of CrateDB:
WKT— geo-proximity scan. Picks a randomgeo_point+timestampfrom a pre-loaded pool and asks for the min/max temperature within 1° of that point at that moment. Exercises spatial filtering ongeo_point. One row out per call. Cheapest of the three; sits at the bottom of the latency chart.REGION— three-table join. Picks a random federal-state name and returns every sensor inside that polygon at the most recent measurement epoch, with its nearest-town label. ExercisesWITHIN(point, polygon)containment, a correlatedmax(measurement_time)subquery, and a join ongeo_location. Almost always the slowest — polygon containment is O(vertices) per candidate point, the subquery scans all ofclimate_data, and the result set is dozens of rows.FTS— full-text relevance ranking. Picks a random term (cars,trains,factories,energy) and runsMATCH(economics, ?)againstgerman_regions, returning the top 3 by_score. Exercises the Lucene-backed full-text index. Three rows out. Fast in steady state, occasional tail spikes on cold matches.
See each implementation's Query types section (Java / Python / .NET) for the SQL and language-specific notes.
After each run, every implementation writes a latency_histogram.png to its working directory — a percentile-distribution plot (50%, 90%, 99%, 99.9%, 99.99%) with one line per query type, rendered with the platform's native plotting library. The shape is the same in all three (REGION climbs into a tail plateau, WKT/FTS stay low); only the styling differs.
| Java — JFreeChart | Python — matplotlib | .NET — ScottPlot |
|---|---|---|
![]() |
![]() |
![]() |
Interactive search tool for CrateDB's german_regions table. Supports semantic search via OpenAI embeddings + KNN_MATCH, and BM25 fulltext search via MATCH — no OpenAI key needed for fulltext mode.
| Language | Directory | Driver |
|---|---|---|
| Java | src_knn_search/main/java/ |
JDBC (postgresql) + Gson |
| Python | src_knn_search/main/python/ |
psycopg + OpenAI |
| .NET (C#) | src_knn_search/main/dotnet/ |
Npgsql |
The sql/ directory contains the DDL and DML needed to set up the demo tables:
| File | Description |
|---|---|
german_weather_data_ddl.sql |
CREATE TABLE statements for climate_data, german_regions, and geo_points |
german_weather_data_dml.sql |
COPY FROM and INSERT statements to load reference data |
The data/ directory contains the reference datasets:
| File | Description |
|---|---|
geo_points.json |
726 weather station locations with nearest-town mappings |
german_regions.json |
16 German states with boundaries, fulltext columns, and embeddings |
export-demo_climate_data_large_v2.json |
Climate measurement readings |
The same three datasets are published as newline-delimited JSON in a public S3 bucket, so the quickest way to populate the demo tables is to let CrateDB pull them in directly. Run the DDL first so the tables exist, then:
COPY demo.geo_points
FROM 'https://guided-path.s3.us-east-1.amazonaws.com/geo_points.json'
WITH (format = 'json') RETURN SUMMARY;
COPY demo.german_regions
FROM 'https://guided-path.s3.us-east-1.amazonaws.com/german_regions.json'
WITH (format = 'json') RETURN SUMMARY;
COPY demo.climate_data
FROM 'https://guided-path.s3.us-east-1.amazonaws.com/export-demo_climate_data_large_v2.json'
WITH (format = 'json') RETURN SUMMARY;Notes:
- It runs on the cluster, not your client. CrateDB fetches each URL server-side, so the cluster nodes need outbound network access to S3. The bucket is public, so no credentials are required.
- Keys in each JSON object map to table columns. These files line up with
the DDL directly:
geo_location([lon, lat]) →GEO_POINT,geo_coords(GeoJSON) →GEO_SHAPE,embedding(1536-element array) →FLOAT_VECTOR, and the ISO-8601measurement_timestring →TIMESTAMP. RETURN SUMMARYreports per-node success/error counts so you can confirm all rows landed (726 geo points, 16 regions, and the full climate stream).- Reloads are idempotent, not additive. All three tables have primary keys
(
geo_pointson(latitude, longitude),german_regionsonregion_name, andclimate_dataon(measurement_time, latitude, longitude)via generated columns —geo_pointitself can't be a key).COPY FROMdoes not upsert, so re-running it on an already-loaded table reports every existing row as a duplicate-key conflict inRETURN SUMMARY(theerror_count) and keeps the current row — it won't silently double the data. To refresh a table from scratch,DELETE/DROPit first; to merge updates, useINSERT … ON CONFLICT DO UPDATEinstead ofCOPY FROM. - Run
REFRESH TABLE demo.geo_points, demo.german_regions, demo.climate_data;afterwards if you want to query the rows immediately.
This is the database-side counterpart to src_stream_load/,
which moves the very same files through Kafka instead: a producer
(stream_load_into_kafka.py) streams them from S3 into Kafka as JSON, Avro, or
Protobuf, and a consumer (stream_from_kafka_into_crate.py) reads them back out
of Kafka and loads them into CrateDB.
A minimal Python MCP server that exposes a single query_sql tool over the weather dataset, so an MCP client like Claude can answer questions about the data in plain English. It is built on the official MCP Python SDK (FastMCP) and talks to CrateDB's HTTP _sql endpoint. The one non-trivial rule — using WITHIN to keep "in Germany" queries inside the country's borders — is baked into the server's instructions.
See the MCP Search overview for install, configuration, and how to register it with an assistant. A draft cratedb.com walkthrough lives in GERMAN_WEATHER_MCP.md.
The grafana/ directory contains a pre-built dashboard for visualizing the weather data:
| File | Description |
|---|---|
german_weather_data.json |
Importable Grafana dashboard with geomap, gauge, and time-series panels. Connects to CrateDB via the PostgreSQL datasource plugin. |
To use it, add a PostgreSQL datasource in Grafana pointing at your CrateDB cluster, then import the JSON file via Dashboards > Import.
- Network access to your CrateDB cluster on port 5432
- The tables above populated in a
demoschema (run the DDL then DML scripts)
See each implementation's README for language-specific setup and usage instructions.
Apache License 2.0. See the LICENSE file.




