fix(vrt): tighten pixelmatch threshold and unify clip shape#25
Conversation
Two combined defects in the visual-regression comparator allowed clear
visual regressions to pass silently (e.g. the quads-rendered test
showing "Number of Quads Rendered:" with no number was reported PASS):
1. pixelmatch was called with `threshold: 0.8`. That value is the
normalized YIQ color distance per pixel - the library default is
0.1 and 0.1-0.2 is the recommended range. At 0.8 a pixel only counts
as different when it is nearly inverted, so missing white text on a
blue background fell well within tolerance and reported 0 differing
pixels. Lowered to 0.1.
2. `SnapshotOptions.clip` had two drifting shapes: the canonical type
in `examples/common/ExampleSettings.ts` used `{x, y, w, h}` while
the comparator and the duplicate type in `visual-regression/src/
snapshot.ts` used `{x, y, width, height}` (which is what Playwright's
`page.screenshot({ clip })` actually requires). The example glue
built the clip with `w/h` keys, so:
- Playwright received `width: undefined, height: undefined` and
likely captured the full viewport instead of the requested rect.
- The comparator's `options.clip?.width || 1080` fallback fed
1080/1920 into pixelmatch regardless of the real image size.
Unified on `{x, y, width, height}` end-to-end (matches Playwright)
and removed the silent fallbacks - compareBuffers now derives width
and height from the PNG buffers themselves and fails loudly on
dimension mismatch.
No call site in `examples/tests/**` passes `clip` explicitly, so the
public-facing shape change is internal to the runner glue.
After this lands the certified snapshots will need a one-time re-cert
via `pnpm test:visual --ci --capture --overwrite` because previous
captures were taken with the broken clip path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
While verifying that PR #25's threshold fix would actually catch regressions, two follow-on defects surfaced that were the *real* reason CI stayed green: 1. The runner's clip handler at index.ts iterated the literal list ['x', 'y', 'w', 'h'] and assigned Math.round(clip[key]) into clip.width/clip.height. That backfill papered over the broken w-vs-width interface mismatch for years. After the earlier commit on this branch unified the example glue on {x, y, width, height}, the same remap then read clip['w']=undefined, produced NaN, and *overwrote* the correctly-set width/height with NaN - causing Playwright to reject the clip on every screenshot. The remap is now redundant (the canonical SnapshotOptions and the example glue already use width/height) and harmful (silently replaces good values with NaN if a w/h key isn't present), so it's removed. The simpler form just rounds the four real keys. 2. The exposed snapshot() callback awaited compareSnapshot() outside any try/catch. snapshotsTested incremented first, then any throw bypassed both the pass and fail counters entirely. With snapshotsFailed === 0 the runner exited 0 and CI reported success even though zero comparisons completed. The CI log for the prior run on this branch was the fingerprint - every test had a "Running…" line with no PASS!/FAILED! after it, and the summary read just "56 snapshots tested" with neither a passed nor failed total. Wrapped the capture/compare block in try/catch; any throw now prints a red FAILED! with the error message and increments snapshotsFailed so the process exit code reflects reality. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Update — pushed 7307dfa6 which is what actually makes this PR's premise hold. While verifying the threshold fix would catch a regression, the prior CI run on this branch passed despite the threshold change. Two follow-on bugs explain that:
Both fixed now: the remap is gone (the canonical |
Summary
Two combined defects in the visual-regression comparator silently passed clear visual regressions (e.g. the quads-rendered test reported
PASS!even though the on-screen text was missing the quad count — surfaced while reviewing PR #24).1.
pixelmatchthreshold was 0.8snapshot.ts:196 (old):
thresholdis normalized YIQ color distance per pixel. Library default is 0.1; the recommended useful range is 0.1–0.2. At 0.8 a pixel only counts as different when it's nearly inverted, so e.g. missing white text on a blue header (YIQ delta ≈ 0.5–0.6) reports 0 differing pixels.Lowered to 0.1 (the library default).
2.
SnapshotOptions.cliphad two drifting shapes{x, y, w, h}.{x, y, width, height}— which is also what Playwright'spage.screenshot({ clip })actually requires.w/hkeys, so:width: undefined, height: undefined— it probably ignored the clip entirely and captured the full viewport.options.clip?.width || 1080and fed 1080/1920 intopixelmatchregardless of the real image dimensions.Unified everything on
{x, y, width, height}and removed the silent fallbacks.compareBuffersnow deriveswidth/heightfrom the PNG buffers and the dimension-mismatch error message includes both actual and expected sizes.Impact
examples/tests/**passesclipexplicitly, so the public-API change is internal to the runner glue.pnpm test:visual --ci --capture --overwriteis expected.Test plan
pnpm test --run(154/154)pnpm build(renderer + visual-regression + examples) cleanpnpm test:visual --ciagainstmain's certified snapshots, then--capture --overwriteto re-cert in the Docker environment.🤖 Generated with Claude Code