Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions .claude/skills/quickstart-validate/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
name: quickstart-validate
description: Validate pg_stat_ch behavior end-to-end against the local quickstart Docker stack via the OTel/Arrow export path (with debug_arrow_dump_dir capturing Arrow IPC batches). Use when a change affects emitted event content (parent_query_id, cmd_type, err_sqlstate, buffer usage, etc.) and you want to confirm it round-trips through the production export pathway before pushing — i.e., the kind of verification the TAP test gives you but without needing a TAP-enabled local Postgres build.
---

# Quickstart-based PR validation (OTel/Arrow path)

Drive pg_stat_ch through its production export path (OTel + Arrow IPC) against the local quickstart Postgres container, capture Arrow batches via `debug_arrow_dump_dir`, and assert on the contents with pyarrow.

## Steps

1. **Install `uv`** (one-time, for inline pyarrow):
```bash
brew install uv # macOS
pip install uv # otherwise
```

2. **Bring up the OTel/Arrow quickstart**:
```bash
./scripts/quickstart.sh up # builds image once, ~10-15 min cold
docker compose -f docker/quickstart/docker-compose.otel.yml up -d --wait # arrow-dump variant
```
The `docker-compose.otel.yml` postgres container is configured with
`use_otel=on`, `otel_arrow_passthrough=on`, and
`debug_arrow_dump_dir=/var/lib/postgresql/arrow-dump` (host-mounted to
`docker/quickstart/arrow-dump/`). The `otel_endpoint` points at a
non-existent collector — the gRPC send fails, but
`MaybeDumpArrowBatch()` writes the IPC file *before* the send, so we
still get fully-formed Arrow batches on the host.

3. **Pick the validation script for the topic under test.** Convention:
- `scripts/quickstart-validate-parent-query-id.sh` — exercises parent_query_id linkage and the off-by-one in CaptureLogEvent.
- …add more as new PRs need them.

If none matches, write one (see "Adding a new script" below).

4. **Run it**:
```bash
./scripts/quickstart-validate-parent-query-id.sh
```
Auto-creates fixtures, runs queries, flushes, parses the Arrow IPC dumps with pyarrow, prints per-assertion PASS/FAIL, exits non-zero on any failure.

5. **On failure, inspect the dumps directly**:
```bash
ls docker/quickstart/arrow-dump/*.ipc
uv run --with pyarrow python -c "
import pyarrow.ipc, sys
with open(sys.argv[1], 'rb') as f:
t = pyarrow.ipc.open_stream(f).read_all()
print(t.to_pandas().to_string())
" docker/quickstart/arrow-dump/<file>.ipc
```

6. **Tear down when done** (optional):
```bash
docker compose -f docker/quickstart/docker-compose.otel.yml down
```

## Why the OTel/Arrow path and not the native ClickHouse path

`docker/quickstart/docker-compose.yml` (the ClickHouse-native variant)
currently can't be used end-to-end: clickhouse-cpp's LZ4-compressed
block format trips a checksum-mismatch on ClickHouse 26.1 server
(separate pre-existing bug, not pg_stat_ch's fault). The OTel/Arrow
path uses ZSTD via Arrow IPC and is unaffected, plus it mirrors the
export pathway actually used in production.

## When to use this skill vs alternatives

- **Quickstart-validate** (this skill): cheap, fast iteration on event-content semantics via the production export path. Best for "did my change produce the right rows downstream?"
- **`./scripts/run-tests.sh 18 regress`**: PG regression suite. Best for SQL-level functionality (does `pg_stat_ch_stats()` return the right shape, does the extension load, etc.). Doesn't exercise the export side.
- **`./scripts/run-tests.sh ../postgres/install_tap tap`**: TAP harness, which CI uses. Most thorough but requires a TAP-enabled local Postgres and a fresh local build.

## Adding a new script

Use `scripts/quickstart-validate-parent-query-id.sh` as a template. Each script should:

1. `ensure_stack_up`: idempotently bring up the OTel/Arrow quickstart compose if it isn't running.
2. **Set up fixtures** with `pg_exec <<SQL`, dropping anything left over from a prior run so the script is re-runnable.
3. **Clear the dump directory** (`rm -f $DUMP_DIR/*.ipc`) and reset state via `SELECT pg_stat_ch_reset()`.
4. **Drive the queries** that exercise the behavior. Use distinctive table/function names as markers — they survive query normalization where literals don't.
5. **`SELECT pg_stat_ch_flush()`** to force an export instead of waiting on the flush timer.
6. **Wait briefly** for `.ipc` files to land in `$DUMP_DIR`.
7. **Parse** with `uv run --with pyarrow` and a Python heredoc that reads each IPC stream and emits one `KEY=VALUE` summary line per assertion result.
8. **Assert** via the `expect` helper.
9. **Drop fixtures** before exiting so leftover state doesn't poison the next run.

Keep each script focused on one PR's worth of behavior; don't try to be a generic regression suite. That's what the TAP and regression harnesses are for.
10 changes: 10 additions & 0 deletions .github/actions/setup-vcpkg/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,21 @@ runs:
steps:
- name: Install vcpkg
shell: bash
env:
# Pin to the same commit as the vcpkg submodule and the Dockerfile
# (vcpkg tag 2026.04.27). Without this checkout the action lands on
# master HEAD, which drifts port-version ABI hashes and silently
# invalidates the binary archive cache between CI runs.
VCPKG_COMMIT: 56bb2411609227288b70117ead2c47585ba07713
run: |
if [ ! -d "$HOME/vcpkg" ]; then
# Full clone required — Arrow depends on Boost, which needs vcpkg git history
git clone https://github.com/microsoft/vcpkg.git "$HOME/vcpkg"
git -C "$HOME/vcpkg" checkout "$VCPKG_COMMIT"
"$HOME/vcpkg/bootstrap-vcpkg.sh" -disableMetrics
else
git -C "$HOME/vcpkg" fetch --quiet origin "$VCPKG_COMMIT" || true
git -C "$HOME/vcpkg" checkout "$VCPKG_COMMIT"
fi
echo "VCPKG_ROOT=$HOME/vcpkg" >> "$GITHUB_ENV"
echo "$HOME/vcpkg" >> "$GITHUB_PATH"
Expand Down
20 changes: 13 additions & 7 deletions docker/postgres-ext.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ RUN apt-get update && apt-get install -y curl ca-certificates gnupg \

# Install vcpkg
# Pin to the same commit as the vcpkg submodule for reproducible builds
ARG VCPKG_COMMIT=12159785447291b4069c82a3fe9c2770a393ac7f
# (vcpkg tag 2026.04.27).
ARG VCPKG_COMMIT=56bb2411609227288b70117ead2c47585ba07713
RUN git clone https://github.com/microsoft/vcpkg.git /opt/vcpkg \
&& git -C /opt/vcpkg checkout "$VCPKG_COMMIT" \
&& /opt/vcpkg/bootstrap-vcpkg.sh -disableMetrics
Expand All @@ -34,18 +35,23 @@ WORKDIR /build/pg_stat_ch
# Copy dependency manifests first for layer caching
COPY vcpkg.json vcpkg-configuration.json ./
COPY triplets/ triplets/
COPY overlay-ports/ overlay-ports/
COPY CMakeLists.txt ./
COPY cmake/ cmake/
COPY include/ include/
COPY src/ src/
COPY sql/ sql/
COPY pg_stat_ch.control ./

RUN cmake -B build -G Ninja \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake \
-DVCPKG_TARGET_TRIPLET=x64-linux-pic \
-DVCPKG_OVERLAY_TRIPLETS=/build/pg_stat_ch/triplets \
# Pick the vcpkg triplet that matches the host architecture so the same
# Dockerfile builds on x64 CI runners and on arm64 dev boxes (Apple Silicon
# under colima/orbstack/etc.).
RUN TRIPLET="$([ "$(uname -m)" = "aarch64" ] && echo arm64-linux-pic || echo x64-linux-pic)" \
&& cmake -B build -G Ninja \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_TOOLCHAIN_FILE=$VCPKG_ROOT/scripts/buildsystems/vcpkg.cmake \
-DVCPKG_TARGET_TRIPLET="$TRIPLET" \
-DVCPKG_OVERLAY_TRIPLETS=/build/pg_stat_ch/triplets \
&& cmake --build build --parallel $(nproc)

FROM postgres:18-bookworm
Expand All @@ -56,7 +62,7 @@ RUN apt-get update && apt-get install -y \
&& rm -rf /var/lib/apt/lists/*

COPY --from=builder /build/pg_stat_ch/build/pg_stat_ch.so /usr/lib/postgresql/18/lib/
COPY --from=builder /build/pg_stat_ch/sql/pg_stat_ch--0.1.sql /usr/share/postgresql/18/extension/
COPY --from=builder /build/pg_stat_ch/sql/ /usr/share/postgresql/18/extension/
COPY --from=builder /build/pg_stat_ch/pg_stat_ch.control /usr/share/postgresql/18/extension/

HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=3 \
Expand Down
2 changes: 2 additions & 0 deletions docker/quickstart/arrow-dump/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*
!.gitignore
44 changes: 44 additions & 0 deletions docker/quickstart/docker-compose.otel.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
services:
# Postgres + pg_stat_ch configured to use the OTel/Arrow export path with
# debug_arrow_dump_dir set, so each Arrow IPC batch is written to a
# host-mounted directory. No real OTel collector is needed for validation:
# the gRPC send fails (otel_endpoint points at a non-existent collector),
# but MaybeDumpArrowBatch() runs *before* the send, so IPC files still land.
# See t/026_arrow_dump.pl for the same trick at the TAP layer.
postgres:
image: quickstart-postgres:latest
pull_policy: never
container_name: psch-quickstart-pg-otel
ports:
- "55432:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
command:
- "postgres"
- "-c"
- "shared_preload_libraries=pg_stat_ch"
- "-c"
- "pg_stat_ch.enabled=on"
- "-c"
- "pg_stat_ch.use_otel=on"
- "-c"
- "pg_stat_ch.otel_arrow_passthrough=on"
- "-c"
- "pg_stat_ch.otel_endpoint=localhost:14317"
- "-c"
- "pg_stat_ch.debug_arrow_dump_dir=/var/lib/postgresql/arrow-dump"
- "-c"
- "pg_stat_ch.flush_interval_ms=250"
- "-c"
- "pg_stat_ch.batch_max=1000"
- "-c"
- "track_io_timing=on"
volumes:
- ./arrow-dump:/var/lib/postgresql/arrow-dump
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 2s
timeout: 5s
retries: 10
Loading
Loading