schema: unify pg_stat_ch events_raw with prod Arrow path, move into Goose layout#99
Draft
JoshDreamland wants to merge 3 commits into
Draft
schema: unify pg_stat_ch events_raw with prod Arrow path, move into Goose layout#99JoshDreamland wants to merge 3 commits into
JoshDreamland wants to merge 3 commits into
Conversation
In-place patch of docker/init/00-schema.sql, the CH-native exporter, and
the TAP tests so the docker quickstart schema aligns with what prod
actually writes to (datagres_otel.query_logs_arrow in clickgres-platform).
This is the pre-cutover unification: pg_stat_ch's CH-native path was
previously isolated from prod, and the two schemas had drifted apart on
both column naming and types.
Column renames (prod-side naming wins; closer to OTel semantic
conventions and minimizes downstream churn):
ts_start -> ts
db -> db_name
username -> db_user
cmd_type -> db_operation
query -> query_text
Type fix:
err_sqlstate FixedString(5) -> LowCardinality(String)
FixedString does not round-trip through Arrow IPC cleanly, and ~270
SQLSTATE codes are dictionary-friendly. The CH-native exporter is
updated to write the column via TagString (clickhouse-cpp's
ColumnString -> CH LowCardinality(String) is fine on the wire).
Envelope columns added (with DEFAULT '' so the CH-native exporter, which
does not yet emit these, continues to insert successfully):
instance_ubid, server_ubid, server_role, region, cell,
service_version, host_id, pod_name
Engine/partitioning aligned with prod:
ORDER BY ts -> ORDER BY (instance_ubid, ts) (tenant locality)
TTL added: toDate(ts) + INTERVAL 180 DAY
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
Materialized views (events_recent_1h, query_stats_5m, db_app_user_1m,
errors_recent) updated to reference the new column names and to include
instance_ubid in their ORDER BY / GROUP BY / SELECT projections so they
remain consistent with the events_raw partitioning strategy.
Test fixtures updated to query the new column names:
t/010_clickhouse_export.pl, t/012_timing_accuracy.pl,
t/021_cmd_type_counts.pl, t/027_query_normalization.pl,
t/031_normalize_cache.pl
parent_query_id is intentionally NOT included here — it's the subject of
PR #95 (parent-query-id-surgical) and lands as its own follow-up
migration after this PR.
Validated end-to-end: docker/init/00-schema.sql applies cleanly on
clickhouse/clickhouse-server:26.1 (the version pinned in
docker/docker-compose.test.yml); INSERTs that omit the envelope columns
fill them via DEFAULT ''; all 4 MVs build. CI will run the TAP suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical move + goose annotations. The pg_stat_ch ClickHouse schema
that was previously the docker quickstart init script becomes the first
real Goose migration under schema/migrations/, matching
clickgres-platform's runner layout (pressly/goose v3,
DialectClickHouse, embed.FS).
Changes to the content of the moved file:
* Header banner rewritten from "CANONICAL SCHEMA REFERENCE / single
source of truth / dual role as docker init" to "initial migration"
framing.
* Added -- +goose Up / -- +goose Down section markers.
* Each CREATE DATABASE / CREATE TABLE / CREATE MATERIALIZED VIEW
wrapped in -- +goose StatementBegin / StatementEnd so goose's
parser handles the multi-statement bodies correctly.
* Removed the pre-CREATE "DROP TABLE IF EXISTS X" idioms — those
existed to make the docker init script idempotent on container
restart, but goose tracks state via goose_db_version. Drops now
live exclusively in the -- +goose Down section in reverse
dependency order.
The schema content itself (column names, types, MV definitions,
ORDER BY / TTL / SETTINGS) is unchanged from the previous commit.
Git rename detection should follow docker/init/00-schema.sql ->
schema/migrations/20260519000001_create_initial_schema.sql.
Also adds schema/migrations/00000000000001_bootstrap.sql, a no-op
SELECT 1 migration required by goose to seed the goose_db_version
table (copied verbatim from clickgres-platform's bootstrap).
Validated end-to-end against clickhouse/clickhouse-server:26.1:
pressly goose v3.27.1 `up` and `reset` round-trip cleanly. All 51
columns and 4 MVs land with the expected types.
Note: this leaves docker/init/ empty. The docker-compose mounts will
need updating in a follow-on PR to point at schema/migrations/ (which
requires a small shim to invoke goose-up at container start, since
clickhouse-server's docker entrypoint cannot parse goose's
StatementBegin/End directives directly).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…client The previous "Initialize ClickHouse schema" step ran `clickhouse-client --multiquery < docker/init/00-schema.sql`. That file moved in the previous commit; pointing the step at the new location without further changes would not work because clickhouse-client cannot parse goose -- +goose Up/Down/StatementBegin/End directives, and would execute the Down section's DROP statements right after the Up section's CREATEs. Switch the step to install pressly/goose v3.27.1 (~5 sec on Ubuntu CI runners which have Go preinstalled) and apply the migrations from schema/migrations/ via `goose ... up` against the running CH container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
|
does this need corresponding change in clickgres-platform? eg for query_id being Int64 vs String can clickgres-platform submodule pg_stat_ch for schema? description talks of cutover, but seems that's not too necessary since CH exporter not being used in prod can we add test that does both CH & arrow? potentially one-after-the-other. to test schema compatibility |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pre-GA unification of pg_stat_ch's ClickHouse schema. The docker quickstart schema (
docker/init/00-schema.sql) and the production Arrow receiver schema (datagres_otel.query_logs_arrowin clickgres-platform) had drifted apart on both column naming and column types. This PR makes pg_stat_ch the source of truth for the unified shape and moves the canonical schema into a Goose migrations layout underschema/migrations/matching the runner clickgres-platform already uses (pressly/goose v3,DialectClickHouse,embed.FS).Two commits, structured so git rename detection can follow the file's evolution cleanly:
Commit 1 —
9c39ecd: in-place unification + tests passIn-place edits to
docker/init/00-schema.sqlso the docker quickstart schema matches what prod actually writes to, plus the CH-native exporter and TAP tests updated to use the new column names.Column renames (prod-side wins — closer to OTel semantic conventions, minimizes downstream churn):
ts_start→tsdb→db_nameusername→db_usercmd_type→db_operationquery→query_textType fix:
err_sqlstate FixedString(5)→LowCardinality(String). FixedString doesn't round-trip through Arrow IPC cleanly, and ~270 SQLSTATE codes are dictionary-friendly. CH-native exporter switches fromMetricFixedString(5, …)toTagString(…)(clickhouse-cppColumnString→ CHLowCardinality(String)is fine on the wire).Envelope columns added with
DEFAULT ''so the CH-native exporter (which doesn't yet emit these) still inserts successfully:instance_ubid,server_ubid,server_role,region,cell,service_version,host_id,pod_name.Engine/partitioning aligned with prod:
ORDER BY ts→ORDER BY (instance_ubid, ts)(tenant locality)TTL toDate(ts) + INTERVAL 180 DAYSETTINGS index_granularity = 8192, ttl_only_drop_parts = 1All four MVs (
events_recent_1h,query_stats_5m,db_app_user_1m,errors_recent) updated to reference the new column names and to includeinstance_ubidin theirORDER BY/GROUP BY/SELECTprojections.parent_query_idis intentionally not included here — it belongs to PR #95 (parent-query-id-surgical) and will land as its own follow-up migration inschema/migrations/after this PR.Commit 2 —
299115b: rename + goose annotationsgit mv docker/init/00-schema.sql schema/migrations/20260519000001_create_initial_schema.sql(96% similarity per git's rename detection), with:-- +goose Up/-- +goose Downsection markers added.CREATEwrapped in-- +goose StatementBegin/StatementEnd.CREATEDROP TABLE IF EXISTS Xidioms removed — those existed for docker init idempotency on restart; goose tracks state viagoose_db_version. Drops live exclusively in the-- +goose Downsection in reverse dependency order.Also adds
schema/migrations/00000000000001_bootstrap.sql, a no-opSELECT 1migration required by goose to seed its version table (copied verbatim from clickgres-platform's bootstrap).What's been validated locally
docker/init/00-schema.sql(commit 1) applies cleanly onclickhouse/clickhouse-server:26.1(the version pinned indocker/docker-compose.test.yml); all 51 columns and 4 MVs land with the expected types.DEFAULT ''lets INSERTs that omit them succeed.schema/migrations/(commit 2) round-trips clean viagoose v3.27.1 upandgoose resetagainst CH 26.1.Out of scope (follow-on PRs)
docker/init/anddocker-compose.test.ymlto run goose-up at container start. Currentlydocker/init/00-schema.sqlis gone, so the docker quickstart and the test compose need a small shim to apply migrations fromschema/migrations/(clickhouse-server's docker entrypoint can't parse-- +goose Up/Downdirectly).query_logs_arrowto the unifiedevents_raw, including historical backfill viaINSERT INTO events_raw SELECT … FROM query_logs_arrowwith explicit casts for the renamed/retyped columns.schema/migrations/<ts>_add_parent_query_id.sqlfor the column it introduces.Test plan
🤖 Generated with Claude Code