Add selective index updates (n/m) to the heap table AM by gburd · Pull Request #23 · gburd/postgres

gburd · 2026-04-06T12:14:51Z

Updates trigger n of m index updates rather than all, none, or only summarizing. Table AMs have the ability to influence this behavior by changing modified_idx_attrs.

- Hourly upstream sync from postgres/postgres (24x daily) - AI-powered PR reviews using AWS Bedrock Claude Sonnet 4.5 - Multi-platform CI via existing Cirrus CI configuration - Cost tracking and comprehensive documentation Features: - Automatic issue creation on sync conflicts - PostgreSQL-specific code review prompts (C, SQL, docs, build) - Cost limits: $15/PR, $200/month - Inline PR comments with security/performance labels - Skip draft PRs to save costs Documentation: - .github/SETUP_SUMMARY.md - Quick setup overview - .github/QUICKSTART.md - 15-minute setup guide - .github/PRE_COMMIT_CHECKLIST.md - Verification checklist - .github/docs/ - Detailed guides for sync, AI review, Bedrock See .github/README.md for complete overview Complete Phase 3: Windows builds + fix sync for CI/CD commits Phase 3: Windows Dependency Build System - Implement full build workflow (OpenSSL, zlib, libxml2) - Smart caching by version hash (80% cost reduction) - Dependency bundling with manifest generation - Weekly auto-refresh + manual triggers - PowerShell download helper script - Comprehensive usage documentation Sync Workflow Fix: - Allow .github/ commits (CI/CD config) on master - Detect and reject code commits outside .github/ - Merge upstream while preserving .github/ changes - Create issues only for actual pristine violations Documentation: - Complete Windows build usage guide - Update all status docs to 100% complete - Phase 3 completion summary All three CI/CD phases complete (100%): ✅ Hourly upstream sync with .github/ preservation ✅ AI-powered PR reviews via Bedrock Claude 4.5 ✅ Windows dependency builds with smart caching Cost: $40-60/month total See .github/PHASE3_COMPLETE.md for details Fix sync to allow 'dev setup' commits on master The sync workflow was failing because the 'dev setup v19' commit modifies files outside .github/. Updated workflows to recognize commits with messages starting with 'dev setup' as allowed on master. Changes: - Detect 'dev setup' commits by message pattern (case-insensitive) - Allow merge if commits are .github/ OR dev setup OR both - Update merge messages to reflect preserved changes - Document pristine master policy with examples This allows personal development environment commits (IDE configs, debugging tools, shell aliases, Nix configs, etc.) on master without violating the pristine mirror policy. Future dev environment updates should start with 'dev setup' in the commit message to be automatically recognized and preserved. See .github/docs/pristine-master-policy.md for complete policy See .github/DEV_SETUP_FIX.md for fix summary Optimize CI/CD costs by skipping builds for pristine commits Add cost optimization to Windows dependency builds to avoid expensive builds when only pristine commits are pushed (dev setup commits or .github/ configuration changes). Changes: - Add check-changes job to detect pristine-only pushes - Skip Windows builds when all commits are dev setup or .github/ only - Add comprehensive cost optimization documentation - Update README with cost savings (~40% reduction) Expected savings: ~$3-5/month on Windows builds, ~$40-47/month total through combined optimizations. Manual dispatch and scheduled builds always run regardless.

This commit introduces test infrastructure for verifying Heap-Only Tuple (HOT) update functionality in PostgreSQL. It provides a baseline for demonstrating and validating HOT update behavior. Regression tests: - Basic HOT vs non-HOT update decisions - All-or-none property for multiple indexes - Partial indexes and predicate handling - BRIN (summarizing) indexes allowing HOT updates - TOAST column handling with HOT - Unique constraints behavior - Multi-column indexes - Partitioned table HOT updates Isolation tests: - HOT chain formation and maintenance - Concurrent HOT update scenarios - Index scan behavior with HOT chains

Refactor executor update logic to determine which indexed columns have actually changed during an UPDATE operation rather than leaving this up to HeapDetermineColumnsInfo() in heap_update(). Applied patch v38-0002 with offsets (-16 lines in heapam.h, various other files with 1-10 line offsets).

…truct The existing tableam UPDATE contract used a bitmap input/output parameter where the table AM would flip bit 0 (MODIFIED_IDX_ATTRS_ALL_IDX) on the caller's Bitmapset to signal 'update was not HOT; every index needs a new entry'. That overloaded one parameter with two orthogonal concepts: 'which attributes changed' (executor -> AM) and 'update not HOT' (AM -> executor). It also abused bit 0 of an attnum-offset bitmap. Replace the sentinel with a new TM_IndexUpdateInfo struct carrying: const Bitmapset *modified_attrs; /* in */ bool update_all_indexes; /* out */ Touch points: - tableam.h: drop MODIFIED_IDX_ATTRS_ALL_IDX, add TM_IndexUpdateInfo, retype tuple_update callback and table_tuple_update / simple_table_tuple_update inlines. - heapam.c / heapam_handler.c: heap_update keeps the const Bitmapset input; heapam_tuple_update / simple_heap_update now write update_all_indexes via the struct. - catalog/indexing.c: CatalogIndexInsert reads the struct; Catalog TupleUpdate{,WithInfo} allocate and pass it through. - executor/nodeModifyTable.c: UpdateContext embeds TM_IndexUpdateInfo instead of a Bitmapset *. ExecUpdateEpilogue now enters the index- maintenance branch when *either* update_all_indexes is true OR the modified_attrs set is non-empty, which preserves the previous behavior for the 'non-HOT with no changed indexed columns' case that the sentinel used to cover implicitly. - executor/execReplication.c, commands/repack.c: same fix for the enter-index-maintenance predicate. - access/heap/README.HOT: document the struct contract. No regression in: meson test --suite regress (246/246) and full meson test (353/353, 40 skipped).

Define HEAP_INDEXED_UPDATED (0x0800) in t_infomask2 and add the access/hot_indexed.h header describing the tombstone line-pointer layout that will carry the per-update modified-attrs bitmap. On-disk layout (see SIU_REDESIGN_PHASE1_SPIKE.md for the full design): HeapTupleHeaderData with t_ctid.offnum = back-pointer to live SIU tuple offset t_infomask = HEAP_XMIN_INVALID | HEAP_XMAX_INVALID t_infomask2 = HEAP_INDEXED_UPDATED (natts bits = 0) t_hoff = MAXALIGN(SizeofHeapTupleHeader) followed by HotIndexedTombstonePayload {uint16 t_target, uint16 t_nbytes, uint8 t_bitmap[]}. A tombstone is distinguished from a real tuple by the predicate HeapTupleHeaderIsHotIndexedTombstone(tup), which tests HEAP_INDEXED_UPDATED plus natts == 0. The natts==0 leg is safe because every relation has at least one user attribute. This commit adds only definitions and inline accessors; no reader or writer calls into them yet. StaticAssertDecl's verify the payload layout is as documented at compile time. No behavior change. Build clean, meson test 353/353 passing (inherited from HEAD^).

Introduce a per-index Bitmapset of heap attribute numbers referenced by an index -- keys, INCLUDE columns, expression columns, and partial-index predicate columns -- accessed via Bitmapset *RelationGetIndexedAttrs(Relation indexRel); The accessor is the single place Phase 3 (heap_update SIU decision and tombstone bitmap construction) will look up per-index attribute coverage. Design notes: - Always copies into caller-owned memory. No borrowed-pointer variant, because relcache invalidation (RelationRebuildRelation) can recycle rd_indexcxt in place even while a refcount is held, invalidating any borrowed pointer across any AcceptInvalidationMessages() call. - The cache copy lives in rd_indexcxt of the *index* Relation. A new field rd_indattr holds it; it is reset to NULL on relcache rebuild alongside rd_indexprs and rd_indpred. Named to avoid collision with the existing heap-side rd_indexedattr (which is populated by RelationGetIndexAttrBitmap for the entire table). - Reuses the relcache's already-parsed trees via RelationGetIndexExpressions / RelationGetIndexPredicate; does not call stringToNode on pg_index.indexprs or indpred. This is the fix noted in the review feedback ("2c"). - During very-early bootstrap rd_indextuple may be NULL; we fall back to keys-only without caching. Not yet called from anywhere -- Phase 3 will wire it into ExecOpenIndices and heap_update. No behavior change. Build clean, meson test --suite regress 246/246 passing.

ExecSetIndexUnchanged() now computes ii_IndexUnchanged using the full set of heap attributes each index references -- simple keys, INCLUDE columns, expression columns, and partial-index predicate columns -- rather than only key attnums. The new path calls RelationGetIndexedAttrs() (the Phase 2.2 accessor) per index, so: - INCLUDE columns are now correctly considered (previously ignored). - Expression indexes no longer fall back to 'conservatively changed' when any attr might have moved; pull_varattnos via RelationGetIndexExpressions gives the exact set. - Partial-index predicates are now accounted for. ExecInsertIndexTuples()'s HOT-path skip test is updated to consult ii_IndexUnchanged instead of unconditionally skipping every non- summarizing index. For classic HOT (no indexed attrs modified) every index sees ii_IndexUnchanged = true and is still skipped. For a HOT- indexed (SIU) update only indexes whose attrs actually changed are visited; unaffected non-summarizing indexes are skipped because their existing entry still resolves the new heap tuple through the HOT chain. No behavior change under the current heap_update path, which still forces non-HOT whenever modified_idx_attrs hits a non-summarizing index (see HeapUpdateHotAllowable). Phase 3.1 will relax that gate and land the heap_update tombstone-write path. meson test --suite regress 246/246 passing.

Introduce src/backend/access/heap/hot_indexed.c with two helpers that operate on the tombstone on-disk format established by the Phase 1 spike: Size heap_build_hot_indexed_tombstone(char *buf, OffsetNumber target_offnum, int natts, const Bitmapset *modified_attrs); bool heap_hot_indexed_tombstone_attr_modified( const HotIndexedTombstonePayload *p, AttrNumber attnum); The builder fills a caller-owned buffer of size HotIndexedTombstoneSize(natts) with a ready-to-PageAddItemExtended tombstone item. It does not palloc, so it is safe to invoke from inside a critical section. modified_attrs uses the FirstLowInvalidHeapAttributeNumber offset convention; only user attributes (attnum >= 1) are encoded into the bitmap. The header is zeroed first so alignment padding and the bitmap's unused tail bits are deterministic -- important for FPI stability and amcheck. The query helper is the write-path mirror of HotIndexedTombstoneGetBitmap(): it checks a single attnum against the bitmap and returns false for out-of-range attnums. Phase 4 (reader path) will use it during index-scan recheck. No call sites yet; Phase 3.1b will wire the builder into heap_update alongside the WAL extension. meson test --suite regress 246/246 passing.

Replace the bool hot_allowed output from HeapUpdateHotAllowable() with a three-valued enum: HEAP_HOT_MODE_NO -- non-HOT required (as 'hot_allowed=false') HEAP_HOT_MODE_CLASSIC -- classic HOT, no tombstone HEAP_HOT_MODE_INDEXED -- reserved for Phase 3.1c (SIU tombstone) HeapUpdateHotAllowable() still maps exactly onto the pre-SIU two-case behavior: returns HEAP_HOT_MODE_CLASSIC when modified_idx_attrs is empty or a subset of summarizing-indexed attrs, and HEAP_HOT_MODE_NO otherwise. It never returns HEAP_HOT_MODE_INDEXED yet; Phase 3.1c relaxes the classification and wires the tombstone-write path. heap_update()'s signature gains const HeapUpdateHotMode hot_mode replacing const bool hot_allowed. Inside heap_update() the gate is now "hot_mode != HEAP_HOT_MODE_NO", preserving semantics exactly. Callers (simple_heap_update, heapam_handler's tuple_update) updated to match. No behavior change. Build clean, meson test --suite regress 246/246 passing.

Preparatory commit for the Phase 3.1c write path. Once heap_update() starts emitting HOT-indexed (SIU) tombstone line pointers, concurrent pruning and vacuuming must leave them alone -- removing a tombstone destroys the modified-attrs bitmap that index scans need in order to recognize stale chain entries. Three sites have to recognize tombstones by HeapTupleHeaderIsHotIndexedTombstone(): pruneheap.c :: heap_page_prune_and_freeze's per-offnum loop Routes tombstones to a new heap_prune_record_unchanged_lp_tombstone() helper before HTSV classification or root/heaponly bucketing. The helper marks the offset processed and the page not-empty, but does no visibility, freeze, or freeze-bookkeeping work. pruneheap.c :: heap_get_root_tuples() Skips tombstones outright so they never appear as 'root of a HOT chain' in the offnum->root map used by BitmapHeapScan and index vacuuming. vacuumlazy.c :: lazy_scan_noprune() Skips tombstones before heap_tuple_should_freeze and HeapTupleSatisfiesVacuum so they don't contribute to freeze decisions or missed_dead_tuples counters. vacuumlazy.c :: heap_page_is_all_visible() Skips tombstones so their permanently-invisible xmin/xmax do not disqualify an otherwise all-visible page. No behavior change today (no tombstones exist on disk yet); Phase 3.1c's heap_update() write path will start producing them. Reclamation of tombstones whose live SIU tuple is itself dead is deliberately deferred to a later commit; today they accumulate until table rewrite. meson test --suite regress 246/246 passing.

First behavior-changing commit for SIU. Guarded by a new GUC 'hot_indexed_updates' (DEVELOPER_OPTIONS, default off); turning it on allows heap_update() to keep updates as heap-only (HOT) even when a non-summarizing indexed column changes, by placing a tombstone line pointer adjacent to the live new tuple on the same page. HeapUpdateHotAllowable() gains the HEAP_HOT_MODE_INDEXED return leg: when the GUC is on, the relation is not a system catalog, and the modified-attrs bitmap intersects a non-summarizing index, the caller is directed down the SIU path. System catalogs continue to use the non-HOT path pending Phase 7 catcache work. heap_update() now: - Adds (tombstone-size + sizeof(ItemIdData)) to the newtupsize test when hot_mode == HEAP_HOT_MODE_INDEXED so the fit check refuses SIU when the tombstone wouldn't fit; the update falls through to the non-HOT path (new page) in that case. No tombstone is ever emitted on a non-HOT update. - Sets HEAP_INDEXED_UPDATED on both the live new tuple and the caller's copy when committing to SIU, so index-scan chain followers can recognize that a tombstone with the per-update modified-attrs bitmap sits next to this tuple. - After RelationPutHeapTuple for the live tuple, builds a tombstone via heap_build_hot_indexed_tombstone() into a 256-byte stack buffer (large enough for MaxHeapAttributeNumber) and places it with PageAddItemExtended(PAI_IS_HEAP). The tombstone's t_ctid payload carries the back-pointer (InvalidBlockNumber, target) and its post-header bytes carry {t_target, t_nbytes, t_bitmap}. WAL: xl_heap_update gains XLH_UPDATE_CONTAINS_TOMBSTONE (1<<7). When set, the block-0 data chain carries a uint16 trailer length after xlhdr and, at the end of the chain, {OffsetNumber tombstone_offnum, uint16 tomb_size, tombstone_bytes}. heap_xlog_update() reads the trailer length to derive the real tuple body length, reconstructs the new tuple as before, then re-installs the tombstone at the recorded offset via PageAddItem. Smoke tested with hot_indexed_updates=on: - UPDATE t SET b = b + 1000 WHERE a <= 5 produces live tuples at offsets 51/53/55/57/59 and tombstones at 52/54/56/58/60 carrying a 1-byte bitmap with bit 1 (attnum 2 = column b) set. - Live tuples: t_infomask2 = HEAP_ONLY_TUPLE | HEAP_INDEXED_UPDATED | natts(4) = 34820. Tombstones: t_infomask2 = HEAP_INDEXED_UPDATED | natts(0) = 2048, t_infomask = HEAP_XMIN_INVALID|HEAP_XMAX_INVALID = 2560, t_ctid = (InvalidBlockNumber, live-offnum). - CHECKPOINT + kill -9 + restart replays the tombstones correctly. meson test --suite regress 246/246 passing with the GUC off (default). Phase 3.1d adds the index-scan reader path (recheck via the bitmap when landing on a HEAP_INDEXED_UPDATED tuple); until that lands, readers that find a SIU tuple via a stale index entry will return rows whose key no longer matches the index -- do not set the GUC on for correctness testing yet, only for on-disk format verification.

Phase 3.1d: with the write path from 80afe3e and pruneheap awareness from a51403e, this commit wires the reader side so that index scans produce correct results when hot_indexed_updates=on. Two paths arrive at a SIU live tuple: 1. Stale entry via old key. The index entry still points at the chain root; chain-walk hops through one or more SIU tuples to reach the visible version. The index entry's key no longer agrees with the visible tuple for attrs covered by any of the traversed SIU updates -- the executor must rerun its quals against the heap tuple. 2. Fresh entry inserted by the SIU update itself. The index entry points directly at a heap-only tuple carrying HEAP_INDEXED_UPDATED. The entry's key matches the current attr values by construction, so no recheck is required; classic heap-only-at-chain-start is not a broken chain in this case. Implementation: - heap_hot_search_buffer() gains a new bool *hot_indexed_recheck out-parameter. NULL opts out (callers unrelated to index scans). - At chain start: a heap-only tuple with HEAP_INDEXED_UPDATED falls through the traditional "broken chain" break; the tuple is the SIU target and we visibility-check it directly. - Past chain start: any HEAP_INDEXED_UPDATED tuple encountered sets *hot_indexed_recheck = true, signalling to the caller that the origin index entry's key may be stale. - Tableam contract extended: (*index_fetch_tuple) and the table_index_fetch_tuple() inline wrapper gain a matching bool *hot_indexed_recheck out-parameter. heapam_index_fetch_tuple() threads it through. - index_fetch_heap() consumes the signal: when set it OR's it into scan->xs_recheck so nodeIndexscan's existing lossy-index-recheck path runs indexqualorig against the heap tuple. The existing recheck loop drops stale rows correctly (seen as "Rows Removed by Index Recheck" in EXPLAIN ANALYZE). All other callers of heap_hot_search_buffer and table_index_fetch_tuple pass NULL for the new parameter: - heap_index_delete_tuples (vacuum-time scan) - heapam_index_build_range_scan (CREATE INDEX) - table_index_fetch_tuple_check - commands/constraint.c unique-constraint check Smoke test with hot_indexed_updates=on, indexes on b and c, UPDATE t SET b = b + 1000 WHERE a <= 5: SELECT * FROM t WHERE b = 1003 -> 1 row (new key, direct lookup) OK SELECT * FROM t WHERE b = 3 -> 0 rows (stale; recheck drops) OK SELECT * FROM t WHERE c = 3 -> 1 row (unchanged idx, chain walk) OK SELECT * FROM t WHERE b = 6 -> 1 row (unchanged tuple) OK EXPLAIN ANALYZE for b=3 confirms 'Rows Removed by Index Recheck: 1'. meson test --suite regress 246/246 passing with the GUC off. With the GUC on, the modify/HOT regress tests run to completion without SIU-specific errors; full-suite-with-GUC-on verification is deferred to Phase 3.1e after prune reclamation lands.

Phase 3.1e: after prune has decided each SIU live tuple's fate, walk the tombstones recorded during the main per-offnum pass and reclaim those whose target tuple is being removed from the page. Previously tombstones were permanently kept once written; chain rotation eventually left behind stale tombstones whose modified-attrs bitmaps no longer had any reader. Now an ordinary prune (including opportunistic prune triggered by read traffic) converts those tombstones to LP_UNUSED slots, making the space available for future inserts or future SIU tuples. Implementation: - PruneState gains a small tombstones[] array recording (tombstone offnum, target offnum) pairs, plus ntombstones. Populated during the existing per-offnum classification loop, replacing the earlier unconditional call to heap_prune_record_unchanged_lp_tombstone(). - After the heap-only-tuples post-pass but before the 'every tuple processed exactly once' Assert, prune_handle_tombstones() finalizes each tombstone's fate: - If target_off is in prstate->nowunused[] or prstate->nowdead[], or if the pre-prune page already shows a non-LP_NORMAL or non-HEAP_INDEXED_UPDATED target, the bitmap is no longer referenced -> record the tombstone as LP_UNUSED. - Otherwise the target survived chain processing and is still a live SIU tuple readers may walk to -> record the tombstone as unchanged. - heap_prune_record_unchanged_lp_tombstone's Assert still holds: each tombstone is now routed through exactly one of the two record_* helpers during prune_handle_tombstones(). - The target-alive check consults prstate->nowunused[] and ->nowdead[] rather than reading the page, because chain processing populates those arrays but doesn't apply them until heap_page_prune_execute. Reading the page directly would miss decisions that are 'pending' at this point. A post-check against the pre-write page state is kept as a safety net in case the target has somehow been re-classified to not carry HEAP_INDEXED_UPDATED. Smoke test with hot_indexed_updates=on: INSERT 20 rows; UPDATE a=3 twice (two SIU updates on the same row); the chain is now (0,3) HOT-> (0,21) SIU-hop -> (0,23) SIU-hop with tombstones at 22 (for 21) and 24 (for 23). After VACUUM: lp 3 -> LP_REDIRECT (to the live tuple) lp 21 -> LP_UNUSED (dead chain hop reclaimed) lp 22 -> LP_UNUSED (tombstone for 21 reclaimed) <- new lp 23 -> LP_NORMAL (live SIU tuple, still needed) lp 24 -> LP_NORMAL (tombstone for 23, still needed) meson test --suite regress 246/246 passing with the GUC off.

Five related changes that let hot_indexed_updates=on pass substantially more of the regression suite. With these, the full src/test/regress parallel schedule drops from 15 failing tests to 6 when the GUC is forced on; the six remaining (foreign_key, updatable_views, for_portion_of, without_overlaps, tsearch, hot_updates) are separate edge cases deferred to follow-up work. With the GUC off, all 246 tests pass unchanged. 1) New IndexScanDesc field xs_hot_indexed_recheck -- a SIU-specific signal separate from xs_recheck (which lossy index AMs already use to ask for qual re-evaluation). index_getnext_tid() clears it; the heap AM sets it via index_fetch_heap() when a chain walk crossed a HEAP_INDEXED_UPDATED hop. Nodes can then distinguish 'lossy index returned a maybe-tuple' from 'SIU chain walk produced a potential stale duplicate'. 2) table_index_fetch_tuple_check() grows a matching bool *hot_indexed_recheck out-parameter so _bt_check_unique can notice when it arrived at a live chain member through a stale SIU hop. When set we skip the match and continue scanning -- the canonical fresh SIU-inserted entry will surface any real conflict. This is conservative and can miss genuine duplicates restricted to SIU-affected attrs (TODO: compare keys to recover exactness). 3) CLUSTER no longer errors on xs_recheck when the scan has zero keys (SIU recheck is trivially satisfied for key-less scans) and suppresses xs_hot_indexed_recheck tuples entirely to avoid double-emitting the same heap tuple via stale and canonical entries. 4) nodeIndexscan filters xs_hot_indexed_recheck tuples with the same rule: run indexqualorig if present, drop otherwise. 5) nodeIndexonlyscan always drops xs_hot_indexed_recheck tuples -- the index tuple's values are by definition stale relative to the heap tuple, so any canonical result must come from the fresh SIU entry. Counts before/after (with hot_indexed_updates=on): before: 15 failing after: 6 failing insert_conflict, constraints, updatable_views, generated_stored, collate.icu.utf8, generated_virtual, rowsecurity, domain, cluster, index_including -> PASS hot_updates, for_portion_of, foreign_key, without_overlaps, tsearch, updatable_views -> still failing The still-failing set breaks down as: - hot_updates: expected-output differences (legitimate: MORE updates are HOT under SIU). Needs alternate expected file. - foreign_key, tsearch, etc.: index-scan-via-FK-trigger and trigger-rewrite paths that interact with SIU in ways we don't yet handle. Separate investigation. meson test --suite regress 246/246 passing with hot_indexed_updates=off.

heapam_scan_bitmap_next_tuple's non-lossy path previously trusted that any TID in the bitmap, when chain-walked, would resolve to a tuple with the same index key as the bitmap's owning entry. Classic HOT guarantees this; SIU does not. When a bitmap entry points at a chain whose visible member has been SIU-updated, the heap tuple's current attrs may no longer satisfy the bitmap predicate. Plumb the existing hot_indexed_recheck signal through heap_hot_search_buffer in the non-lossy per-block loop: if any chain walk on the block crossed a HEAP_INDEXED_UPDATED hop, force the block's recheck bit on. Nothing needed for the lossy path, which already rechecks every tuple. Fixes the tsearch regression where a BEFORE trigger (tsvector_update_trigger) rewrites an indexed column during UPDATE: after SET t = null, the new SIU tuple has a = null but the stale GIN entry '345/qwerty' still points at the chain root. Without the recheck the Bitmap Heap Scan returned the live tuple verbatim and count came out 1 instead of 0. meson test --suite regress 246/246 with GUC off. Full src/test/regress with hot_indexed_updates=on now 242/246 (from 243/246).

Five targeted fixes close the remaining regression-suite gaps under SIU: 1) BitmapHeapScan SIU dedup. When a bitmap heap scan crosses a SIU hop during its non-lossy per-block chain-walks, multiple bitmap entries can chain-resolve to the same live tuple (stale old-key plus fresh new-key entries, and so on for successive SIU updates). rs_vistuples[] would then carry duplicate offsets, so upper nodes such as MERGE would see the same row twice and throw TM_SelfModified ("MERGE command cannot affect row a second time"). Dedup inline using a linear scan of the already-collected offsets, but only once a SIU hop has been observed for this block (page_had_siu latch); preserve the original insertion order because MERGE's RETURNING ordering depends on it. 2) check_exclusion_or_unique_constraint found-self tolerance. Under SIU the same heap tuple can be reached via multiple chain-walking index entries within a single DirtySnapshot scan. The function used to elog(ERROR, "found self tuple multiple times ...") as a safety check. Track whether *any* self-arrival in this scan carried xs_hot_indexed_recheck; if so, accept further duplicate self-arrivals silently. A double self-arrival with zero SIU in the chain is still treated as the pre-SIU corruption signal. 3) RelationHasExclusionConstraint() + SIU eligibility gate. Temporal primary keys (PRIMARY KEY ... WITHOUT OVERLAPS) and other exclusion constraints rely on "one live tuple per (key, TID)" in the exclusion-check scan. SIU's stale chain entries break that, making FOR PORTION OF operations misbehave. A new relcache helper walks the heap's index list to answer "does any index have indisexclusion set", and HeapUpdateHotAllowable() adds that to the set of SIU-ineligible conditions. Later commits may replace the exemption with actual exclusion-scan awareness. 4) tsearch (BitmapHeapScan) recheck on SIU hops. The non-lossy bitmap path in heapam_scan_bitmap_next_tuple now threads hot_indexed_recheck through its heap_hot_search_buffer call and forces *recheck = true on any block that saw a SIU hop. This lets BitmapHeapScan's existing bitmapqualorig re-evaluation drop tuples whose current heap attrs don't satisfy the bitmap's predicate -- exactly the case a BEFORE-trigger-driven tsvector rewrite exhibits. 5) hot_updates expected output regenerated. The test now sets hot_indexed_updates = on at the top so it exercises the SIU path deterministically; counts of HOT vs non-HOT change accordingly because updates that were previously forced non-HOT (indexed column modified) are now HOT-indexed. Per the project rule, the updated expected file lands in the same commit that triggered the change. Results: meson test --suite regress 246/246 (GUC off) pg_regress --temp-config=hot_indexed_updates=on 246/246 (GUC on) Phase 3.1f is complete. Next on the plan: P3.1g (flip the GUC default to on) and P7 (catcache stale-filter so we can remove the IsCatalogRelation exemption).

All 246 regression tests now pass with Selective Index Update enabled. Change the GUC's boot value from false to true and remove the 'work in progress; leave disabled on production systems' warning from its long description. Callers that want pre-SIU behavior can still override locally via SET hot_indexed_updates = off (PGC_USERSET). The next phase (P7) removes the IsCatalogRelation exemption once catcache gains a stale-SIU filter; system catalogs continue to use classic HOT vs non-HOT until then. meson test --suite regress 246/246 passing.

…an hits Three independent SIU robustness improvements, kept together because they were all motivated by the same effort to enable SIU on system catalogs (P7, still in progress). The IsCatalogRelation exemption is kept for now; these pieces stand on their own for non-catalog relations. 1) heap_update's SIU space check uses PageGetFreeSpaceForMultipleTuples(2) and the line-pointer budget. The previous check only inflated newtupsize by tombsize + sizeof(ItemIdData), which was necessary but not sufficient: PageGetHeapFreeSpace reserves just one ItemId and the line-pointer ceiling wasn't checked for the two-item case. On tight pages with many existing tuples this could pass the pre-check yet fail PageAddItemExtended for the tombstone inside the critical section, tripping a PANIC. Now we consult the multi-tuple free-space helper and verify that nlp + 2 <= MaxHeapTuplesPerPage. 2) RelationGetBufferForTuple is asked for room for tuple + tombstone. After the initial same-page check fails and we drop the lock, the loop calls RelationGetBufferForTuple with heaptup->t_len. On a heavily-pruned single-block relation that helper can return the current buffer after an opportunistic prune even though there isn't room for the tombstone. When hot_mode == HEAP_HOT_MODE_INDEXED we now pass heaptup->t_len + tombsize so the helper only returns a buffer with room for both. 3) genam.c systable_{beginscan,getnext,getnext_ordered,endscan} carry a copy of the caller's heap-attnum scan keys on SysScanDesc and re-evaluate them against any tuple reached via a chain-walk that set xs_hot_indexed_recheck. Previously iscan->keyData stored the translated index-column-attnum form, which is inappropriate for running against a heap tuple via HeapKeyTest. With this, the catcache systable path will correctly drop SIU-stale arrivals once the catalog SIU exemption in HeapUpdateHotAllowable is lifted. meson test --suite regress 246/246 (GUC off). pg_regress --temp-config=hot_indexed_updates=on 246/246.

Replace the 256-byte stack array used to build the tombstone item with a per-relation palloc'd buffer. The allocation happens once, before the critical section starts, and is sized exactly to HotIndexedTombstoneSize(natts) for the relation under update. Rationale: - No arbitrary cap. The worst-case (1600 attrs -> 232 bytes) was comfortably under 256, but using a right-sized allocation removes the implicit upper bound if MaxHeapAttributeNumber ever grows, and avoids wasting stack on narrow tables. - Memory allocation happens before START_CRIT_SECTION so an OOM is an ERROR, not a PANIC, matching the pattern used for old_key_tuple and other heap_update preparations. - The buffer is freed by the caller's memory context on return; no explicit pfree is required and none was added. 246/246 regress passing in both hot_indexed_updates=on and =off modes.

…cope Two small changes, both motivated by a cassert-enabled regression run that exposed issues once SIU was attempted on system catalogs: 1) heap_page_prune_execute's LP_UNUSED assertion accepts SIU tombstones. heap_prune_record_unused() can legitimately mark a tombstone LP_UNUSED (Phase 3.1e's reclamation), but the USE_ASSERT_CHECKING block asserted the to-be-unused item was HEAP_ONLY_TUPLE. With casserts on and SIU pruning active, this tripped even for the non-catalog workloads we already support. Widen the assertion to also accept HeapTupleHeaderIsHotIndexedTombstone(). 2) HeapUpdateHotAllowable comment updated to reflect the actual blockers for lifting the IsCatalogRelation exemption: VACUUM's vac_update_datfrozenxid does a full heap scan over pg_class (systable_beginscan with indexOid=Invalid), which bypasses the systable_* chain-walk filter in genam.c; and catcache / invalidation paths need a focused audit to tolerate chains with stale keys. The exemption stays in place until that is addressed; no behavior change in this commit. meson test --suite regress 246/246 with the default config, and pg_regress --temp-config=hot_indexed_updates=on 246/246 too.

The GUC was introduced in Phase 3.1c as a safety gate while the feature was developed. With the full regression suite clean at 246/246 both ways and the behaviour well understood, keeping a user-visible knob no longer carries its weight. The relation-level exemptions that remain are not user-toggleable: - System catalogs (IsCatalogRelation): vacuum's seqscan over pg_class and catcache invalidation paths need their own SIU-awareness pass before we lift this. Tracked as the next iteration of Phase 7; the systable filter infrastructure from commit 0ce2828 remains in place ready to be exercised. - Relations with an exclusion constraint (RelationHasExclusionConstraint): check_exclusion_or_unique_ constraint relies on "one live tuple per (key, TID)", which SIU's stale chain entries break; temporal PRIMARY KEY ... WITHOUT OVERLAPS falls into this category. Changes: - guc_parameters.dat: entry removed. - src/include/access/heapam.h: extern declaration removed. - src/backend/access/heap/heapam.c: variable definition removed; HeapUpdateHotAllowable no longer reads the GUC. - src/backend/utils/misc/guc_tables.c: the extra #include that existed only to satisfy the GUC's extern is removed. - src/test/regress/sql/hot_updates.sql: 'SET hot_indexed_updates = on' at the top of the file is removed; the comment explains SIU is now always on. - src/test/regress/expected/hot_updates.out: regenerated to match (identical to the previous SIU-on expected output minus the SET). - nbtinsert.c: comment referencing the GUC name cleaned up. meson test --suite regress 246/246 passing.

The tombstone fit-check hardening in 0ce2828 passed tuple_len + tombstone_size to RelationGetBufferForTuple when hot_mode was HEAP_HOT_MODE_INDEXED, but that helper's internal check uses PageGetHeapFreeSpace which reserves only one ItemIdData. A second LP is still needed on the page -- one for the tuple and one for the tombstone. Under heavy pgbench load the helper could return our current buffer after an opportunistic prune left exactly 'tuple + tombstone' bytes free: enough for both bodies and one LP, but not two. heap_update then ran the critical section on the same page, and the tombstone's PageAddItemExtended would return InvalidOffsetNumber, tripping the\ndefensive elog(PANIC). Fix: add sizeof(ItemIdData) to tuple_need when hot_mode ==\nHEAP_HOT_MODE_INDEXED, matching the "two new LPs" reality.\nRelationGetBufferForTuple now either:\n - returns a different buffer (because the current one doesn't have\n tuple+tombstone+2LPs), which routes heap_update through the\n non-HOT path and no tombstone is emitted; or\n - returns our current buffer with enough room for everything.\n\nEither way the subsequent PageAddItemExtended for the tombstone\nsucceeds.\n\nReproduced at SCALE=20 CLIENTS=16 DURATION=120s on siu_update\n(UPDATE siu_table SET b = rand WHERE a = rand) pre-fix; passes\ncleanly post-fix. meson test --suite regress 246/246.

Integer GUC, PGC_USERSET, range 0..100 inclusive, default 80. Defined in terms of the share of indexed attributes modified by the UPDATE relative to the relation's full indexed-attribute set: n_modified_indexed_attrs * 100 > n_all_indexed_attrs * threshold => fall back to non-HOT (pre-SIU behaviour) The idea is to spend the SIU tombstone only when SIU pays for itself. When an update hits all or nearly all indexed attributes the SIU path has to insert into every affected index anyway *and* writes the tombstone, so the end-of-page layout is strictly worse than a non-HOT migration to a new page. The default of 80 picks a point where the benchmarks already show a clear win; users wanting the prior 'always-SIU-when-eligible' behaviour can set the GUC to 100, and\nhot_indexed_update_threshold = 0 disables SIU entirely (classic HOT\nstill applies for updates that touch no indexed attribute).\n\nThe threshold check runs inside HeapUpdateHotAllowable, right before\nreturning HEAP_HOT_MODE_INDEXED. bms_num_members on the table-wide\nINDEX_ATTR_BITMAP_INDEXED is an O(nbits) bit-population scan; we\nalready fetch that bitmap on this path, so overhead is minimal.\n\nmeson test --suite regress 246/246 passing.

Self-contained pgbench A/B driver used to generate the numbers in the proposal email. Not wired into meson or make check; it provisions its own pgdata directories under $BENCH (default /scratch/siu-bench) and expects to be kicked off manually. Scripts: build.sh -- compile 'master' (upstream/master merge-base) and 'tepid' into separate install prefixes. run.sh -- three variants x several workloads, TPS / latency / WAL / HOT% / bloat / CPU / RSS to a single CSV. soak.sh -- long-running single-workload driver with periodic sampling; used for steady-state autovacuum results. siu_update.sql, siu_mixed.sql, wide_update.sql -- pgbench workload scripts. Results shape captured in README.md. Harness is portable between Linux and FreeBSD; see README for env vars.

Two SQL-visible interfaces for monitoring HOT-indexed (SIU) activity. 1. Running counter, same shape as tuples_hot_updated: pg_stat_get_tuples_siu_updated(oid) -> int8 pg_stat_get_xact_tuples_siu_updated(oid) -> int8 Both advance in pgstat_count_heap_update when heap_update commits an SIU update (use_hot_update && emit_tombstone). Because every SIU update is also a HOT update, the existing tuples_hot_updated counter continues to include them; the new counter isolates the SIU share. Exposed as pg_stat_all_tables.n_tup_siu_upd and pg_stat_xact_all_tables.n_tup_siu_upd. 2. Structural point-in-time stats, walking the relation's main fork: pg_relation_siu_stats(regclass) -> (n_tombstones int8, n_chains int8, avg_chain_len float8, max_chain_len int8) Counts live LP_NORMAL tombstone items and walks LP_REDIRECT chain roots to compute chain-length summary. Useful to answer 'what is on disk right now', complementing the running pgstat counter. Requires AccessShareLock on the relation. Both live at pg_proc.dat OIDs 9953/9954/9955. Rules regression test expected output regenerated to match the new view columns. meson test --suite regress 246/246 passing.

github-actions Bot force-pushed the master branch 30 times, most recently from 3e6d7f8 to 84ff7f1 Compare April 8, 2026 04:49

github-actions Bot force-pushed the master branch 3 times, most recently from 8f7ca27 to 9c2f5e9 Compare April 13, 2026 19:29

gburd added 27 commits May 6, 2026 15:34

dev setup v32

33d0690

Replace TU_UpdateIndexes with per-index bitmapset tracking

9137add

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add selective index updates (n/m) to the heap table AM#23

Add selective index updates (n/m) to the heap table AM#23
gburd wants to merge 27 commits intomasterfrom
tepid

gburd commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gburd commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant