Skip to content

fix(vrt): tighten pixelmatch threshold and unify clip shape#25

Merged
chiefcll merged 2 commits into
mainfrom
fix/vrt-threshold-and-clip-shape
May 21, 2026
Merged

fix(vrt): tighten pixelmatch threshold and unify clip shape#25
chiefcll merged 2 commits into
mainfrom
fix/vrt-threshold-and-clip-shape

Conversation

@chiefcll
Copy link
Copy Markdown
Contributor

Summary

Two combined defects in the visual-regression comparator silently passed clear visual regressions (e.g. the quads-rendered test reported PASS! even though the on-screen text was missing the quad count — surfaced while reviewing PR #24).

1. pixelmatch threshold was 0.8

snapshot.ts:196 (old):

{ threshold: 0.8 } // Adjust threshold for sensitivity

threshold is normalized YIQ color distance per pixel. Library default is 0.1; the recommended useful range is 0.1–0.2. At 0.8 a pixel only counts as different when it's nearly inverted, so e.g. missing white text on a blue header (YIQ delta ≈ 0.5–0.6) reports 0 differing pixels.

Lowered to 0.1 (the library default).

2. SnapshotOptions.clip had two drifting shapes

Unified everything on {x, y, width, height} and removed the silent fallbacks. compareBuffers now derives width/height from the PNG buffers and the dimension-mismatch error message includes both actual and expected sizes.

Impact

  • No call site under examples/tests/** passes clip explicitly, so the public-API change is internal to the runner glue.
  • Previously certified snapshots will likely re-fail once the comparator and the clip path actually work together. A one-time re-cert via pnpm test:visual --ci --capture --overwrite is expected.
  • Other PRs in flight (e.g. fix(text): baseline-anchored layout, include lineGap, trim trailing letterSpacing #23) become more useful — diffs will actually surface.

Test plan

  • pnpm test --run (154/154)
  • pnpm build (renderer + visual-regression + examples) clean
  • pnpm test:visual --ci against main's certified snapshots, then --capture --overwrite to re-cert in the Docker environment.

🤖 Generated with Claude Code

chiefcll and others added 2 commits May 21, 2026 15:03
Two combined defects in the visual-regression comparator allowed clear
visual regressions to pass silently (e.g. the quads-rendered test
showing "Number of Quads Rendered:" with no number was reported PASS):

1. pixelmatch was called with `threshold: 0.8`. That value is the
   normalized YIQ color distance per pixel - the library default is
   0.1 and 0.1-0.2 is the recommended range. At 0.8 a pixel only counts
   as different when it is nearly inverted, so missing white text on a
   blue background fell well within tolerance and reported 0 differing
   pixels. Lowered to 0.1.

2. `SnapshotOptions.clip` had two drifting shapes: the canonical type
   in `examples/common/ExampleSettings.ts` used `{x, y, w, h}` while
   the comparator and the duplicate type in `visual-regression/src/
   snapshot.ts` used `{x, y, width, height}` (which is what Playwright's
   `page.screenshot({ clip })` actually requires). The example glue
   built the clip with `w/h` keys, so:
     - Playwright received `width: undefined, height: undefined` and
       likely captured the full viewport instead of the requested rect.
     - The comparator's `options.clip?.width || 1080` fallback fed
       1080/1920 into pixelmatch regardless of the real image size.

   Unified on `{x, y, width, height}` end-to-end (matches Playwright)
   and removed the silent fallbacks - compareBuffers now derives width
   and height from the PNG buffers themselves and fails loudly on
   dimension mismatch.

No call site in `examples/tests/**` passes `clip` explicitly, so the
public-facing shape change is internal to the runner glue.

After this lands the certified snapshots will need a one-time re-cert
via `pnpm test:visual --ci --capture --overwrite` because previous
captures were taken with the broken clip path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
While verifying that PR #25's threshold fix would actually catch
regressions, two follow-on defects surfaced that were the *real* reason
CI stayed green:

1. The runner's clip handler at index.ts iterated the literal list
   ['x', 'y', 'w', 'h'] and assigned Math.round(clip[key]) into
   clip.width/clip.height. That backfill papered over the broken
   w-vs-width interface mismatch for years. After the earlier commit
   on this branch unified the example glue on {x, y, width, height},
   the same remap then read clip['w']=undefined, produced NaN, and
   *overwrote* the correctly-set width/height with NaN - causing
   Playwright to reject the clip on every screenshot.

   The remap is now redundant (the canonical SnapshotOptions and the
   example glue already use width/height) and harmful (silently
   replaces good values with NaN if a w/h key isn't present), so it's
   removed. The simpler form just rounds the four real keys.

2. The exposed snapshot() callback awaited compareSnapshot() outside
   any try/catch. snapshotsTested incremented first, then any throw
   bypassed both the pass and fail counters entirely. With
   snapshotsFailed === 0 the runner exited 0 and CI reported success
   even though zero comparisons completed. The CI log for the prior
   run on this branch was the fingerprint - every test had a "Running…"
   line with no PASS!/FAILED! after it, and the summary read just
   "56 snapshots tested" with neither a passed nor failed total.

   Wrapped the capture/compare block in try/catch; any throw now
   prints a red FAILED! with the error message and increments
   snapshotsFailed so the process exit code reflects reality.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chiefcll
Copy link
Copy Markdown
Contributor Author

Update — pushed 7307dfa6 which is what actually makes this PR's premise hold.

While verifying the threshold fix would catch a regression, the prior CI run on this branch passed despite the threshold change. Two follow-on bugs explain that:

  1. visual-regression/src/index.ts had a hidden w/h → width/height clip remap that papered over the broken interface for years. After this PR's example-side fix sends {x, y, width, height} directly, the remap then read clip['w'] = undefined, ran Math.round(undefined) = NaN, and overwrote the correctly-set width/height with NaN. Playwright rejected the malformed clip on every screenshot.

  2. The exposed snapshot() callback awaited compareSnapshot() outside any try/catch. Every thrown comparison bypassed both snapshotsPassed and snapshotsFailed, so snapshotsFailed === 0 → exit code 0 → green CI. The fingerprint in the prior log was Running … lines with no PASS!/FAILED! after them and a summary that read just "56 snapshots tested" with no passed or failed totals.

Both fixed now: the remap is gone (the canonical SnapshotOptions already standardized on width/height), and the comparator block is wrapped — exceptions now print a red FAILED! with the error and increment snapshotsFailed.

@chiefcll chiefcll merged commit 0d2eb65 into main May 21, 2026
1 check failed
@chiefcll chiefcll deleted the fix/vrt-threshold-and-clip-shape branch May 21, 2026 19:38
chiefcll added a commit that referenced this pull request May 21, 2026
Pulls in #24 (renderUpdate event name) and #25 (VRT threshold + clip
shape) so this branch's CI runs against the tightened comparator. The
visual-regression diffs surfaced here should now reflect real layout
changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant