Skip to content

feat(ce-demo-reel): add demo reel skill with Python capture pipeline#541

Merged
tmchow merged 28 commits intomainfrom
tmchow/generalize-evidence-capture
Apr 10, 2026
Merged

feat(ce-demo-reel): add demo reel skill with Python capture pipeline#541
tmchow merged 28 commits intomainfrom
tmchow/generalize-evidence-capture

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Apr 9, 2026

Summary

Visual evidence for PRs now works for any project type. CLI tools, libraries, desktop apps all get demos. Previously, the capture flow assumed a Rails-style dev server and agent-browser, so CLI and library PRs got nothing.

ce-demo-reel replaces the hardcoded capture block with a tiered architecture. A Python script handles the deterministic parts (project detection, tool checks, stitching, upload). The agent just decides what to capture and asks the user to approve.

Demo

3 capture tiers, 3 GIFs. Each was produced by a different tier of ce-demo-reel to show how the skill picks its approach based on project type and available tools.

Terminal recording (VHS tier): the pipeline detecting project types and recommending capture tiers in a live terminal session.

Terminal recording demo

Screenshot reel (silicon + ffmpeg tier): detect a CLI tool from its package.json, then get the recommended capture tier.

Screenshot reel demo

Browser reel (agent-browser + ffmpeg tier): headless browser screenshots across 3 pages on bun.sh, stitched into an animated GIF.

Browser reel demo

What changed

New ce-demo-reel skill with tiered capture:

Tier Best for Tools Output
Browser reel Web apps, Electron via CDP agent-browser + ffmpeg Animated GIF
Terminal recording CLI tools with motion vhs (charmbracelet) Animated GIF
Screenshot reel CLI discrete steps silicon + ffmpeg Animated GIF
Static screenshots Fallback agent-browser or silicon PNGs

Python pipeline script (scripts/capture-demo.py) with 7 subcommands:

  • preflight: tool availability (JSON output, replaces 4 separate command -v calls)
  • detect: project type from manifests (replaces agent interpreting a reference file)
  • recommend: tier recommendation lookup (replaces agent interpreting a markdown table)
  • stitch: frame normalization + 2-pass GIF stitching
  • screenshot-reel: silicon rendering + stitch in 1 call
  • terminal-recording: VHS execution + output validation
  • upload: catbox.moe with retry (30s timeout, 10s connect timeout)

git-commit-push-pr improved:

  • Restructured 3 prose-heavy sections into bullet logic for better agent instruction-following. Shipped together because ce-demo-reel integrates into the PR description flow.
  • New "Frame the narrative" step forces before/after/scope articulation before drafting. Also strengthens "lead with value" with mechanism-vs-outcome guidance.
  • New writing voice defaults to catch AI slop: active voice, no em dashes, plain English, varied sentence length, digits over words.

Scratch space guidance added to AGENTS.md. Rule of thumb: .context/ for workflow state other skills read, mktemp -d for throwaway artifacts like screenshots and GIFs.

feature-video removed. GitHub-native MP4 upload via DOM manipulation was over-engineered for what GIFs on catbox.moe do more simply.

Key design decisions

  • Python over bash. The original bash script hit 4 bugs across review rounds (set -e footguns, negative array indexing, frame reduction logic, command builtin). Python's subprocess handling is safer and the script is testable without BATS.
  • Script handles execution, skill handles judgment. Manifest parsing, tool checks, ffmpeg commands, and upload retry are in the script. The agent decides what to capture and which pages to visit.
  • Artifacts in OS temp, not the repo. Evidence files get uploaded and discarded. mktemp -d keeps them out of the repo tree entirely.

Test plan

  • bun run release:validate passes (42 skills, 51 agents)
  • bun test passes (657 tests)
  • tests/ce-demo-reel.test.ts: 21 tests covering:
    • Preflight JSON output
    • Project detection (8 scenarios: web-app, cli-tool, desktop-app, library, text-only, Electron priority, Rails, Go CLI)
    • Tier recommendation (6 scenarios: all tool combos, no-tools fallback, available-list filtering)
    • Stitch arg validation + ffmpeg integration (GIF magic bytes, multi-frame)
    • Upload error paths
  • No stale references to evidence-capture or feature-video in skill files

Compound Engineering
Claude Code

tmchow and others added 2 commits April 9, 2026 15:27
…move feature-video

Replace the hardcoded bin/dev + agent-browser + phantom imgup block in
shipping-workflow.md with a project-type-aware evidence-capture skill that
works across web apps, CLI tools, libraries, and desktop apps.

The skill auto-detects project type, checks available tools (agent-browser,
vhs, silicon, ffmpeg), recommends a capture tier (browser reel, terminal
recording, screenshot reel, static screenshots, or skip), and uploads
evidence to catbox.moe. A bash pipeline script handles ffmpeg stitching,
frame normalization, palette generation, size optimization, and upload.

Removes feature-video -- its GitHub-native MP4 upload via agent-browser DOM
manipulation was over-engineered. GIFs uploaded to catbox render inline
everywhere without platform-specific upload hacks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ullet logic

Convert three dense prose paragraphs into structured bullet/sub-bullet
lists for better agent instruction-following and human reviewability:

- DU-3: 6-action paragraph -> numbered step list with sub-bullets
- Step 1 clean tree: interleaved prose/bullets -> labeled decision tree
- Step 7 existing PR: dense paragraph -> 5-step numbered list

Also adds evidence-capture integration to Step 6 with a user-facing
question gate for when the PR has observable behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d75fa0c4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Upload fallback now stages, commits, and pushes evidence to the branch
  so GitHub can render the relative path (previously left as local-only)
- Add capture-evidence.sh test suite covering arg validation and ffmpeg
  stitch integration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d2825b13a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tmchow and others added 2 commits April 9, 2026 16:34
- Replace negative array indexing (bash 4+) with portable index
  computation for macOS /bin/bash 3.2 compatibility
- Fix ffmpeg availability check in tests: use `which` instead of
  `command` (shell builtin, not spawnable as executable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move frame existence validation before ffmpeg/ffprobe tool checks so
the test gets the correct "Frame not found" error on CI runners that
don't have ffmpeg installed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7495bef165

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Adds a "Frame the narrative before sizing" pre-writing step that forces
before/after/scope articulation before drafting. Also strengthens the
"lead with value" principle with a mechanism-vs-outcome anti-example,
updates the medium sizing tier to reference the narrative frame, and
adds guidance to name new test files in test plans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e9bdeae29

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tmchow and others added 4 commits April 9, 2026 19:11
…ror handling

- Frame reduction step minimum raised from 1 to 2 so 3-4 frame GIFs
  actually drop middle frames instead of re-adding all of them
- Curl upload failures now use || true to prevent set -euo pipefail
  from exiting before the retry/fallback logic can run

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all deterministic logic into a Python script with 7 subcommands:
preflight, detect, recommend, stitch, screenshot-reel,
terminal-recording, upload. The agent's role shrinks to judgment calls
(what to capture, which tier, user approval) while the script handles
how.

- Remove capture-evidence.sh and project-detection.md (logic moved to
  Python detect subcommand)
- Update SKILL.md Steps 2/4/6 to call script instead of inline logic
- Update tier references to use script for stitching and recording
- Fix Codex review findings: require ffprobe for stitched tiers,
  rewrite VHS tape Output when --output overrides it
- Remove docs/evidence commit fallback — artifacts are ephemeral, use
  OS temp not repo tree
- Tests expanded from 8 to 21 covering preflight, detect (8 project
  type scenarios), recommend (6 tier mappings), stitch, and upload

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add rule of thumb: .context/ for workflow state that other skills read,
mktemp -d for throwaway artifacts (screenshots, GIFs, recordings) that
get uploaded and discarded. Keeps ephemeral files out of the repo tree.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename skill directory, frontmatter, script (capture-demo.py), and all
cross-skill references. Adds "capture evidence" and "add proof to a PR"
as trigger phrases in the description so the old vocabulary still works.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tmchow tmchow changed the title feat(evidence-capture): generalize evidence capture across project types feat(demo-reel): add demo-reel skill with Python capture pipeline Apr 10, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f129c2a63c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…mo-reel

- Fix Angular detection: use @angular/core instead of angular (the npm
  package key is scoped)
- Catch subprocess.TimeoutExpired on curl upload so timeouts fall
  through to retry logic instead of raising a traceback
- Rename demo-reel -> ce-demo-reel to align with ce: namespace
  convention (ce-plan, ce-review, ce-work, ce-demo-reel)
- Rename test file to ce-demo-reel.test.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tmchow tmchow changed the title feat(demo-reel): add demo-reel skill with Python capture pipeline feat(ce-demo-reel): add demo-reel skill with Python capture pipeline Apr 10, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 100e5ab31d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

120s per attempt meant a worst-case 245s blocking the agent when catbox
is down. With 30s timeout + 10s connect timeout + 2s retry sleep, worst
case is ~64s. Healthy uploads complete in 2-5s regardless.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tmchow tmchow changed the title feat(ce-demo-reel): add demo-reel skill with Python capture pipeline feat(ce-demo-reel): add demo reel skill with Python capture pipeline Apr 10, 2026
tmchow and others added 3 commits April 9, 2026 20:27
- Use #0d1117 (GitHub dark) as background instead of #aaaaff (lavender)
- Add --no-round-corner to avoid corner artifacts against dark bg
- Add --no-line-number for cleaner terminal output frames
- Match ffmpeg padding color to #0d1117 so stitched frames are seamless

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
20px was too tight — window dots sat at the top edge with no breathing
room. 40px gives a comfortable margin above and below.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mp tapes

- run_cmd catches TimeoutExpired and returns a controlled failure
  instead of crashing with a traceback
- Frame normalization aborts on ffmpeg failure instead of silently
  continuing with potentially stale output
- VHS tape rewrite uses tempfile.mkstemp instead of deterministic .tmp
  suffix to avoid overwriting user files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b425cdf468

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tmchow and others added 4 commits April 9, 2026 20:43
Active voice, no em dashes, plain English, varied sentence length,
digits over words, no filler phrases. Technical jargon stays when it's
the clearest term. User style preferences override these defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The -- pattern is just an em dash in disguise. Updated the writing voice
rule to explicitly ban it and suggest periods, commas, colons, or
parentheses instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Palette generation and final GIF encoding ignored return codes, so a
failed ffmpeg could report success if a stale output file existed from
a previous run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Specify argument format when invoking ce-demo-reel (pass a target
  description inferred from the diff)
- Document how to detect failure from ce-demo-reel output (check Tier,
  URL, and Embed fields)
- DU-3 now checks for existing evidence in the PR body and preserves
  it unless the user asks to refresh or remove
- Fix bare "demo-reel" reference in example to "ce-demo-reel"
- Simplify ce-demo-reel output: remove redundant Label field, clarify
  that Embed is the deliverable with heading included

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc724649e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tmchow and others added 5 commits April 9, 2026 20:55
- detect command now uses git rev-parse --show-toplevel so it finds
  manifests from the repo root regardless of working directory
- Remix detection uses @remix-run/react (the actual npm package key)
  instead of bare "remix"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detection now scopes to the relevant subdirectory based on the diff
target from Step 0. In monorepos, the agent passes the changed
package's root instead of the repo root. If the agent's understanding
of the change contradicts the script's classification, the agent wins.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These documented the GitHub native video upload and agent-browser auth
patterns used by feature-video, which was replaced by ce-demo-reel's
simpler catbox GIF approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Knowledge track learning from the ce-demo-reel build. Bash hit 4 bug
classes across review rounds (set -e footguns, bash 3.2 compat, frame
reduction math, builtin spawning). Python subprocess model eliminates
all of them. Includes when bash is still the right choice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54a738ad05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

tmchow and others added 3 commits April 9, 2026 21:13
ce-demo-reel now returns Tier, Description (1-liner of what the evidence
shows), and URL. The caller formats the markdown. Removes the Embed
field and markdown generation from upload-and-approval.md. Cleaner
boundary: ce-demo-reel captures, the caller presents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Short tokens like "gin", "echo", "chi" false-positive on unrelated
module names (e.g., "engine" contains "gin"). Now matches full paths
like github.com/gin-gonic/gin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Node: express, fastify, koa, hono
Go: net/http
Python: sanic, litestar
Rust: poem, tide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0964cf4d0d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

net/http is stdlib and never appears in go.mod. The agent detects
stdlib web servers from source imports in the diff and overrides the
classification per the "signal, not a gate" design.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tmchow tmchow merged commit b979143 into main Apr 10, 2026
2 checks passed
@github-actions github-actions bot mentioned this pull request Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant