Skip to content

feat: emit parent_query_id to link nested SPI queries#95

Open
JoshDreamland wants to merge 10 commits into
mainfrom
parent-query-id-surgical
Open

feat: emit parent_query_id to link nested SPI queries#95
JoshDreamland wants to merge 10 commits into
mainfrom
parent-query-id-surgical

Conversation

@JoshDreamland
Copy link
Copy Markdown
Contributor

@JoshDreamland JoshDreamland commented May 13, 2026

Assigns a parent_query_id to events to avoid polluting metrics for nested queries (when plpgsql functions issue SPI statements that emit events within events).

Changes

  • PschEvent::top_levelPschEvent::parent_query_id (same slot in event/ring_entry; static asserts still hold).
  • Three single statics (rusage_start, query_start_ts, current_query_is_top_level) consolidated into one fixed-size query_stack[16] of {queryid, rusage_start, query_start_ts} indexed by nesting_level. The existing PG_TRY/nesting_level pattern in ExecutorRun/Finish/ProcessUtility is unchanged.
  • 2-line RecordUInt64+Append for the new column in stats_exporter.cc.
  • parent_query_id UInt64 column in docker/init/00-schema.sql; migrations/001_add_parent_query_id.sql for existing deployments.

Bug fix

The prior single static rusage_start was overwritten by nested getrusage() calls, so the outer query's CPU delta only measured the post-SPI portion. Per-frame baselines fix that alongside parent linkage.

Test plan

  • Local build (cmake + ninja)
  • Regression: mise run test:regress
  • Isolation: mise run test:isolation
  • CI green across PG 16/17/18 × amd64/arm64 + Clang job

Each event now carries the queryid of its calling query (0 for top-level).
Lets aggregations filter to top-level work with WHERE parent_query_id = 0,
avoiding the double-counting that happens when plpgsql functions issue
SPI statements that themselves emit events.

The Event payload's prior `bool top_level` is replaced in-place by
`uint64 parent_query_id`; static asserts in ring_entry.h continue to verify
layout equivalence with the on-wire ring slot.

Single statics for rusage_start, query_start_ts, and current_query_is_top_level
are consolidated into a fixed-size query_stack[16] indexed by nesting_level.
The previous single rusage_start was clobbered on nested SPI getrusage()
calls, so nested CPU deltas were measuring just the inner portion and the
outer's CPU was wrong; per-frame baselines fix that alongside parent linkage.
The PG_TRY/nesting_level pattern in ExecutorRun/Finish/ProcessUtility is
untouched.

Schema gains a parent_query_id UInt64 column; migrations/001_add_parent_query_id.sql
covers existing deployments. This supersedes #61 (which mixed the feature
with broader refactor churn — vector→array→array+counter iterations,
helper extraction, depth-cap retunes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)
Copilot AI review requested due to automatic review settings May 13, 2026 15:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds parent_query_id to emitted events so nested SPI queries can be linked to their caller and excluded from “top-level only” aggregations, while also fixing nested CPU timing by switching from single baselines to a per-nesting frame stack.

Changes:

  • Replaced top_level with parent_query_id in the in-memory event/ring-entry payload.
  • Introduced a fixed-size query_stack to track per-nesting queryid, CPU baseline, and start timestamp.
  • Exported and persisted parent_query_id via ClickHouse schema changes plus a migration.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/queue/ring_entry.h Replaces top_level with parent_query_id in the ring buffer entry while preserving the block-copy prefix layout checks.
src/queue/event.h Replaces top_level with parent_query_id in the public event struct.
src/hooks/hooks.c Implements per-nesting query frames and populates parent_query_id during executor/utility/log event emission.
src/export/stats_exporter.cc Adds a parent_query_id column to exported event rows.
docker/init/00-schema.sql Adds parent_query_id column to the canonical ClickHouse schema.
migrations/001_add_parent_query_id.sql Adds a migration for adding parent_query_id to existing ClickHouse tables.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docker/init/00-schema.sql
Comment thread migrations/001_add_parent_query_id.sql Outdated
Comment on lines +11 to +12
ALTER TABLE pg_stat_ch.events_raw
ADD COLUMN IF NOT EXISTS parent_query_id UInt64 DEFAULT 0;
Comment thread src/hooks/hooks.c Outdated
Comment thread src/export/stats_exporter.cc
Comment thread src/hooks/hooks.c
Comment on lines 888 to 892
static void CaptureLogEvent(ErrorData* edata) {
PschEvent event;
InitBaseEvent(&event, GetCurrentTimestamp(), (nesting_level == 0), PSCH_CMD_UNKNOWN);
InitBaseEvent(&event, GetCurrentTimestamp(), GetParentQueryId(), PSCH_CMD_UNKNOWN);

UnpackSqlState(edata->sqlerrcode, event.err_sqlstate);
JoshDreamland and others added 2 commits May 13, 2026 12:47
clang-format wants the column-aligned trailing comments above
parent_query_id to share its (wider) alignment column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot review on #95:

1. parent_query_id was UInt64 while query_id is Int64; comparisons/joins
   between the two would require explicit casts in ClickHouse.  Switch the
   ClickHouse column, migration, and exporter to Int64.  Keep the in-memory
   struct field as uint64 (matches PG's queryId type) and cast at append
   time the same way query_id already does.

2. At nesting_level >= PSCH_MAX_NESTING_DEPTH, the PschExecutorEnd frame
   pointer is NULL, leaving start_ts at 0 (the PG epoch).  Any path that
   computed GetCurrentTimestamp() - start_ts would yield ~25 years of us.
   Fall back to GetCurrentTimestamp() so deltas are ~0 instead.  In practice
   query_desc->totaltime supplies a real duration via instrumentation, so
   the fallback subtraction is only used when totaltime wasn't allocated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 13, 2026 16:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

JoshDreamland and others added 5 commits May 13, 2026 15:58
028_parent_query_id.pl exercises the three semantic invariants by
inspecting actual exported rows in ClickHouse:

  1. Top-level queries report parent_query_id = 0.
  2. Nested SPI queries report parent_query_id = outer caller's query_id
     (verified by self-join on the events_raw table).
  3. A log/error event captured during nested execution reports
     parent_query_id = outer caller (not the running query itself) and a
     non-zero query_id (the running statement).  This catches the
     CaptureLogEvent off-by-one: nesting_level is bumped inside
     ExecutorRun, so slot nesting_level - 1 holds the running query and
     its caller lives at nesting_level - 2.

Filters key off distinctive table/function names rather than constants
or comments — those survive query normalization, where literals would
be replaced with $N placeholders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CaptureLogEvent off-by-one came from nesting_level moving in places
where the frame stack didn't — Run/Finish bumped/decremented it around
the body, while the frame slot was written in Start and only read in
End.  Inside the body (where CaptureLogEvent fires) nesting_level was
one higher than it was at End, so the same "slot[nesting_level - 1]"
expression meant different things in each place.

Bind the two together: nesting_level moves exactly where a frame is
pushed or popped, and parent_query_id is captured into the frame at
push time.  Now slot[nesting_level - 1] is the currently-active query
at every emit site — End, ProcessUtility, CaptureLogEvent — and parent
is read directly from frame->parent_query_id without offset arithmetic.

Mechanics:
  - PushQueryFrame in PschExecutorStart bumps nesting_level after
    writing the slot.  PopQueryFrame in PschExecutorEnd decrements.
  - PschExecutorRun and PschExecutorFinish no longer touch nesting_level
    on the success path; each just wraps its chain in PG_TRY/PG_CATCH
    so a longjmp out of the body pops the frame on the unwind and
    keeps depth balanced.  Each PG_CATCH on the unwind path decrements
    once per level — three deep nesting with an error at the bottom
    cascades through three PG_CATCH blocks for three pops, exactly
    matching three pushes.  No subxact callback needed.
  - PschProcessUtility brackets its body with PG_TRY/PG_FINALLY in the
    same function, so push at the top and PopQueryFrame in PG_FINALLY
    balance regardless of how the body exits.  ExecuteUtilityWithNesting
    is gone.
  - CaptureLogEvent now reads TopQueryFrame() and uses both its queryid
    (the running query, attached as event.queryid for attribution) and
    its parent_query_id (the running query's caller, attached as
    event.parent_query_id with strict semantics — previously this slot
    was misread as the running query's own id).

PeekQueryStack and its offset parameter are gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The totaltime allocation runs after standard_ExecutorStart returns,
between our PG_END_TRY and the close of PschExecutorStart.  InstrAlloc
goes through MemoryContextAllocZero, which can ereport(ERROR) on OOM —
that longjmp would unwind past our PG_END_TRY without firing the
PG_CATCH, leaving the pushed frame unpopped.

Move the allocation inside the existing PG_TRY block so the same
PG_CATCH cleans up the frame.  No behavior change on the happy path;
new robustness against any future code added between the push and the
function's return.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The push had been doing more conditional work than necessary to figure
out whether a parent existed.  Convert nesting_level so that:

  - -1 is the resting state (no active query),
  - n in [0, PSCH_MAX_NESTING_DEPTH-1] is the slot index of the
    currently-active frame on top of the stack,
  - >= PSCH_MAX_NESTING_DEPTH is the overflow region.

Push now increments first and bails on overflow, so the cap check
collapses to a single comparison.  The parent_query_id lookup still
needs one conditional ("do we have a parent at all"), but the previous
double-condition (positive AND within cap) is gone.  TopQueryFrame /
PopQueryFrame benefit too: < 0 doubles as a nullity check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cleanups around the runaway-nesting case:

  - PushQueryFrame now emits a WARNING when it returns NULL because the
    cap was exceeded.  Previously the overflow was silent — telemetry
    just went missing for the frame.  WARNING is loud enough to notice
    if it happens in practice but doesn't fail the query.

  - PschProcessUtility's start_ts fallback was still 0 (PG epoch) when
    frame was NULL.  Match PschExecutorEnd by falling back to
    GetCurrentTimestamp() so the emitted event carries an approximately
    correct ts_start rather than 1970.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JoshDreamland JoshDreamland force-pushed the parent-query-id-surgical branch from 3232e8e to c632cdc Compare May 14, 2026 14:42
Copilot AI review requested due to automatic review settings May 14, 2026 14:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Earlier commits on this branch added parent_query_id to PschEvent and to
the ClickHouse-native exporter, plus the ClickHouse schema migration, but
arrow_batch.cc keeps its own hard-coded column list and was missed.  The
result: the production export path (OTel + Arrow IPC) silently dropped
parent_query_id on every event, so downstream consumers saw 0 across the
board.

Caught via the OTel/Arrow quickstart-validate harness (see PR #96).
Local script flips from 4/8 passing to all-green once this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JoshDreamland JoshDreamland requested a review from serprex May 14, 2026 21:06
Copilot AI review requested due to automatic review settings May 15, 2026 15:06
@JoshDreamland JoshDreamland force-pushed the parent-query-id-surgical branch from 31eadfe to 18a6eae Compare May 15, 2026 15:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@JoshDreamland JoshDreamland force-pushed the parent-query-id-surgical branch from 18a6eae to 85266f0 Compare May 15, 2026 15:22
t/028 already exercises parent_query_id linkage through the
ClickHouse-native exporter.  Add t/029 to exercise the same
invariants through the Arrow/OTel export path (the production
pathway), using pg_stat_ch.debug_arrow_dump_dir to capture each
Arrow IPC batch to disk before the gRPC send (which we point at a
non-existent collector so it fails harmlessly).  Same trick as
t/026_arrow_dump.

t/029 guards specifically against the earlier surgical-PR regression
where arrow_batch.cc was missing parent_query_id entirely (the
ClickHouse-native exporter had it, so t/028 alone would have let
the gap through).

Test 3 in both files uses RAISE WARNING inside a nested SPI call to
exercise CaptureLogEvent's queryid/parent_query_id assignment.  We
cannot use a caught ERROR here: errfinish PG_RE_THROWs at elog.c:539
for ERROR-level events without calling EmitErrorReport, so
emit_log_hook only fires later from PostgresMain's top-level catch
(after all frames have been popped via PG_CATCH unwinding) — or
never at all, for errors caught by a plpgsql EXCEPTION block.
WARNING-level events go through EmitErrorReport directly, so our
hook fires while the inner SPI's frame is still on the stack, which
is the only scenario where CaptureLogEvent's slot choice is
observable.

The assertions distinguish "inner SPI queryid" from "outer caller
queryid" so a regression that attributes the log event to the wrong
slot (the off-by-one we're guarding against) would be caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 15, 2026 18:12
@JoshDreamland JoshDreamland force-pushed the parent-query-id-surgical branch from 85266f0 to f52d76c Compare May 15, 2026 18:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@JoshDreamland
Copy link
Copy Markdown
Contributor Author

I missed stuff, because Arrow is a different code path with different implications. I think that's addressed but I'm feeling less sure of myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants