feat: detect and connect data warehouse sources#488
Conversation
🧙 Wizard CIRun the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands: Test all apps:
Test all apps in a directory:
Test an individual app:
Show more apps
Results will be posted here when complete. |
Add a config-driven detector registry that scans a project's dependencies and .env key names (never values) for data warehouse sources, and a new `warehouse` program that connects them: in-CLI creation for databases/API-key sources, deep-link to the app for OAuth sources. The main flow also surfaces a soft prompt when a source is detected during a normal run. The agent playbook lives in the context-mill `data-warehouse-source-setup` skill; the wizard handles detection and orchestration only.
0f0ae95 to
6255622
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new “data warehouse sources” capability to the wizard: it can detect common warehouse/DB/SaaS sources used by a project and then either guide setup via a new warehouse program or offer a post-run deep-link prompt in the main flow.
Changes:
- Introduces a config-driven detector registry + detection engine (
src/lib/warehouse-sources/) that scans dependency manifests and.envkey names to infer source kinds/modes. - Adds a new
npx @posthog/wizard warehouseprogram with a detection intro screen and skill-driven setup flow. - Adds a post-run “Connect your data warehouse?” soft prompt in the main PostHog integration program (gated on detected sources + dismiss state).
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/open-url.ts | Adds a helper to open URLs in the default browser without blocking the wizard process. |
| src/ui/tui/store.ts | Adds a session setter to persist dismissal of the warehouse offer screen. |
| src/ui/tui/screens/WarehouseOfferScreen.tsx | New post-run TUI screen offering to open the PostHog “new source” setup URL. |
| src/ui/tui/screens/WarehouseIntroScreen.tsx | New intro/detection-result screen for the dedicated warehouse program. |
| src/ui/tui/screen-sequences.ts | Registers new screen IDs for warehouse intro/offer screens. |
| src/ui/tui/screen-registry.tsx | Registers the new WarehouseIntro/WarehouseOffer screens with the TUI screen factory. |
| src/ui/tui/tests/programs.test.ts | Adds predicate tests ensuring the warehouse offer is shown/hidden correctly. |
| src/lib/wizard-session.ts | Adds warehouseOfferDismissed to persisted session state defaults/types. |
| src/lib/warehouse-sources/types.ts | Defines typed detector config surface and detected-source payload shape. |
| src/lib/warehouse-sources/registry.ts | Adds initial detector entries mapping signals to PostHog source kinds + creation modes. |
| src/lib/warehouse-sources/detect.ts | Implements repo scanning and matching against the detector registry. |
| src/lib/warehouse-sources/tests/detect.test.ts | Adds unit tests for detection across npm/python/env/gemfile signals and deduping/ignore rules. |
| src/lib/programs/warehouse-source/steps.ts | Defines step sequence for the new warehouse program (detect → intro → auth → run → outro → skills). |
| src/lib/programs/warehouse-source/index.ts | Registers the new program config and builds a prompt including detected sources for the skill. |
| src/lib/programs/warehouse-source/detect.ts | Program-level adapter writing detect results/errors into frameworkContext + abort cases. |
| src/lib/programs/warehouse-source/content/index.tsx | Re-exports the generic agent-skill learn deck for the warehouse program. |
| src/lib/programs/program-registry.ts | Adds the new warehouse-source program to the registry/enum. |
| src/lib/programs/posthog-integration/steps.ts | Adds the post-outro warehouse offer step gated on detection results and dismiss state. |
| src/lib/programs/posthog-integration/detect.ts | Runs warehouse source detection during the main integration detect step. |
| README.md | Documents the new npx @posthog/wizard warehouse command and behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try { | ||
| spawn(cmd, args, { detached: true, stdio: 'ignore' }).unref(); | ||
| } catch { | ||
| // Ignore — the URL is always shown as text as a fallback. | ||
| } |
| function ingestFile( | ||
| name: string, | ||
| fullPath: string, | ||
| signals: ProjectSignals, | ||
| ): void { | ||
| const content = safeRead(fullPath); | ||
| if (content === null) return; | ||
|
|
||
| if (name === 'package.json') { | ||
| addNpmDeps(content, signals); | ||
| } else if (name === 'requirements.txt') { | ||
| parseRequirementsTxt(content).forEach((d) => signals.python.add(d)); | ||
| } else if (name === 'pyproject.toml') { | ||
| parsePyprojectToml(content).forEach((d) => signals.python.add(d)); | ||
| } else if (name === 'Pipfile') { | ||
| parsePipfile(content).forEach((d) => signals.python.add(d)); | ||
| } else if (name === 'Gemfile') { | ||
| parseGemfile(content).forEach((g) => signals.ruby.add(g)); | ||
| } else if (name === '.env' || name.startsWith('.env')) { | ||
| parseEnvKeys(content).forEach((k) => signals.envKeys.add(k)); | ||
| } | ||
| } |
| const errorView = detectError ? ( | ||
| <> | ||
| <Box flexDirection="column" marginBottom={1}> | ||
| <Text color="red" bold> | ||
| {'✘'} No data warehouse source detected | ||
| </Text> | ||
| <Box marginTop={1} flexDirection="column"> | ||
| <DetectErrorBody error={detectError} /> | ||
| </Box> |
Add detectors for Convex, Clerk, Resend, Shopify, Klaviyo, Chargebee, Paddle, Polar, Mailchimp, Customer.io, Typeform (in-CLI) and Intercom, Linear (deep-link), covering most released sources with a reliable codebase footprint. Sentry uses token auth, so move it to in-CLI.
|
Hey, thank you for adding this! this will help us match up with the onboarding flow even more! What I'll ask is that you ship this as its own command, maybe instead of data-warehouse, call it I'm also gonna ask that we ship this as its own clean skills for now, because the actual task queue is a bit buggy right now, and we're looking to fix that soon. Also excuse our mess in this repo, and props for pushing through some of our slop <3 |
@gewenyu99 When you say its own command, you mean something other than |
Summary
Adds the ability for the wizard to detect data warehouse sources a project already uses (Postgres, MySQL, MongoDB, Snowflake, BigQuery, Stripe, …) and help connect them to PostHog's data warehouse.
src/lib/warehouse-sources/) — scans dependencies (npm/python/ruby) and.envkey names (never values) and maps signals to a PostHog sourcekind+ creation mode. Adding a source is one registry entry.kindstrings verified against the MCPexternal-data-sources-wizardtool.warehouseprogram (npx @posthog/wizard warehouse) — mirrorsrevenue-analytics. In-CLI creation for databases/API-key sources; deep-link to the app's new-source flow for OAuth sources. The hybrid behaviour lives in the context-mill skill (see below), keeping source knowledge out of the runner.Respects the codebase discipline: detection is a typed config surface; the creation procedure is a context-mill skill; field schemas come from the MCP at runtime.
Companion PR
The agent playbook ships as the
data-warehouse-source-setupskill in context-mill — this program references it byskillId. That skill must be released before the end-to-end flow works.Test plan
pnpm buildpassespnpm lint— 0 errorspnpm test— full suite green (added detector tests +warehouse-offerpredicate tests)npx @posthog/wizard warehouseagainst a Postgres/Stripe project once the context-mill skill is released🤖 Generated with Claude Code