Skip to content

feat: detect and connect data warehouse sources#488

Open
Gilbert09 wants to merge 2 commits into
mainfrom
feat/warehouse-source-detection
Open

feat: detect and connect data warehouse sources#488
Gilbert09 wants to merge 2 commits into
mainfrom
feat/warehouse-source-detection

Conversation

@Gilbert09
Copy link
Copy Markdown
Member

Summary

Adds the ability for the wizard to detect data warehouse sources a project already uses (Postgres, MySQL, MongoDB, Snowflake, BigQuery, Stripe, …) and help connect them to PostHog's data warehouse.

  • Config-driven detector registry (src/lib/warehouse-sources/) — scans dependencies (npm/python/ruby) and .env key names (never values) and maps signals to a PostHog source kind + creation mode. Adding a source is one registry entry. kind strings verified against the MCP external-data-sources-wizard tool.
  • New warehouse program (npx @posthog/wizard warehouse) — mirrors revenue-analytics. In-CLI creation for databases/API-key sources; deep-link to the app's new-source flow for OAuth sources. The hybrid behaviour lives in the context-mill skill (see below), keeping source knowledge out of the runner.
  • Main-flow soft prompt — the default wizard now detects sources during its normal run and, after the outro, offers to connect them (opens the pre-filled new-source page in the browser, or skip). Hidden entirely when nothing is detected.

Respects the codebase discipline: detection is a typed config surface; the creation procedure is a context-mill skill; field schemas come from the MCP at runtime.

Companion PR

The agent playbook ships as the data-warehouse-source-setup skill in context-mill — this program references it by skillId. That skill must be released before the end-to-end flow works.

Test plan

  • pnpm build passes
  • pnpm lint — 0 errors
  • pnpm test — full suite green (added detector tests + warehouse-offer predicate tests)
  • Manual: run npx @posthog/wizard warehouse against a Postgres/Stripe project once the context-mill skill is released
  • Manual: confirm the soft prompt appears after a normal run in a repo with a detectable source, and is hidden otherwise

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings May 28, 2026 13:53
@github-actions
Copy link
Copy Markdown

🧙 Wizard CI

Run the Wizard CI and test your changes against wizard-workbench example apps by replying with a GitHub comment using one of the following commands:

Test all apps:

  • /wizard-ci all

Test all apps in a directory:

  • /wizard-ci basic-integration
  • /wizard-ci misc
  • /wizard-ci revenue

Test an individual app:

  • /wizard-ci basic-integration/android
  • /wizard-ci basic-integration/angular
  • /wizard-ci basic-integration/astro
Show more apps
  • /wizard-ci basic-integration/django
  • /wizard-ci basic-integration/fastapi
  • /wizard-ci basic-integration/flask
  • /wizard-ci basic-integration/javascript-node
  • /wizard-ci basic-integration/javascript-web
  • /wizard-ci basic-integration/laravel
  • /wizard-ci basic-integration/next-js
  • /wizard-ci basic-integration/nuxt
  • /wizard-ci basic-integration/python
  • /wizard-ci basic-integration/rails
  • /wizard-ci basic-integration/react-native
  • /wizard-ci basic-integration/react-router
  • /wizard-ci basic-integration/sveltekit
  • /wizard-ci basic-integration/swift
  • /wizard-ci basic-integration/tanstack-router
  • /wizard-ci basic-integration/tanstack-start
  • /wizard-ci basic-integration/vue
  • /wizard-ci misc/quack-quack
  • /wizard-ci revenue/stripe

Results will be posted here when complete.

Add a config-driven detector registry that scans a project's
dependencies and .env key names (never values) for data warehouse
sources, and a new `warehouse` program that connects them: in-CLI
creation for databases/API-key sources, deep-link to the app for
OAuth sources. The main flow also surfaces a soft prompt when a
source is detected during a normal run.

The agent playbook lives in the context-mill `data-warehouse-source-setup`
skill; the wizard handles detection and orchestration only.
@Gilbert09 Gilbert09 force-pushed the feat/warehouse-source-detection branch from 0f0ae95 to 6255622 Compare May 28, 2026 13:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “data warehouse sources” capability to the wizard: it can detect common warehouse/DB/SaaS sources used by a project and then either guide setup via a new warehouse program or offer a post-run deep-link prompt in the main flow.

Changes:

  • Introduces a config-driven detector registry + detection engine (src/lib/warehouse-sources/) that scans dependency manifests and .env key names to infer source kinds/modes.
  • Adds a new npx @posthog/wizard warehouse program with a detection intro screen and skill-driven setup flow.
  • Adds a post-run “Connect your data warehouse?” soft prompt in the main PostHog integration program (gated on detected sources + dismiss state).

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/utils/open-url.ts Adds a helper to open URLs in the default browser without blocking the wizard process.
src/ui/tui/store.ts Adds a session setter to persist dismissal of the warehouse offer screen.
src/ui/tui/screens/WarehouseOfferScreen.tsx New post-run TUI screen offering to open the PostHog “new source” setup URL.
src/ui/tui/screens/WarehouseIntroScreen.tsx New intro/detection-result screen for the dedicated warehouse program.
src/ui/tui/screen-sequences.ts Registers new screen IDs for warehouse intro/offer screens.
src/ui/tui/screen-registry.tsx Registers the new WarehouseIntro/WarehouseOffer screens with the TUI screen factory.
src/ui/tui/tests/programs.test.ts Adds predicate tests ensuring the warehouse offer is shown/hidden correctly.
src/lib/wizard-session.ts Adds warehouseOfferDismissed to persisted session state defaults/types.
src/lib/warehouse-sources/types.ts Defines typed detector config surface and detected-source payload shape.
src/lib/warehouse-sources/registry.ts Adds initial detector entries mapping signals to PostHog source kinds + creation modes.
src/lib/warehouse-sources/detect.ts Implements repo scanning and matching against the detector registry.
src/lib/warehouse-sources/tests/detect.test.ts Adds unit tests for detection across npm/python/env/gemfile signals and deduping/ignore rules.
src/lib/programs/warehouse-source/steps.ts Defines step sequence for the new warehouse program (detect → intro → auth → run → outro → skills).
src/lib/programs/warehouse-source/index.ts Registers the new program config and builds a prompt including detected sources for the skill.
src/lib/programs/warehouse-source/detect.ts Program-level adapter writing detect results/errors into frameworkContext + abort cases.
src/lib/programs/warehouse-source/content/index.tsx Re-exports the generic agent-skill learn deck for the warehouse program.
src/lib/programs/program-registry.ts Adds the new warehouse-source program to the registry/enum.
src/lib/programs/posthog-integration/steps.ts Adds the post-outro warehouse offer step gated on detection results and dismiss state.
src/lib/programs/posthog-integration/detect.ts Runs warehouse source detection during the main integration detect step.
README.md Documents the new npx @posthog/wizard warehouse command and behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/utils/open-url.ts
Comment on lines +16 to +20
try {
spawn(cmd, args, { detached: true, stdio: 'ignore' }).unref();
} catch {
// Ignore — the URL is always shown as text as a fallback.
}
Comment on lines +127 to +148
function ingestFile(
name: string,
fullPath: string,
signals: ProjectSignals,
): void {
const content = safeRead(fullPath);
if (content === null) return;

if (name === 'package.json') {
addNpmDeps(content, signals);
} else if (name === 'requirements.txt') {
parseRequirementsTxt(content).forEach((d) => signals.python.add(d));
} else if (name === 'pyproject.toml') {
parsePyprojectToml(content).forEach((d) => signals.python.add(d));
} else if (name === 'Pipfile') {
parsePipfile(content).forEach((d) => signals.python.add(d));
} else if (name === 'Gemfile') {
parseGemfile(content).forEach((g) => signals.ruby.add(g));
} else if (name === '.env' || name.startsWith('.env')) {
parseEnvKeys(content).forEach((k) => signals.envKeys.add(k));
}
}
Comment on lines +99 to +107
const errorView = detectError ? (
<>
<Box flexDirection="column" marginBottom={1}>
<Text color="red" bold>
{'✘'} No data warehouse source detected
</Text>
<Box marginTop={1} flexDirection="column">
<DetectErrorBody error={detectError} />
</Box>
Add detectors for Convex, Clerk, Resend, Shopify, Klaviyo, Chargebee,
Paddle, Polar, Mailchimp, Customer.io, Typeform (in-CLI) and Intercom,
Linear (deep-link), covering most released sources with a reliable
codebase footprint. Sentry uses token auth, so move it to in-CLI.
@gewenyu99
Copy link
Copy Markdown
Collaborator

Hey, thank you for adding this! this will help us match up with the onboarding flow even more!

What I'll ask is that you ship this as its own command, maybe instead of data-warehouse, call it @posthog/wizard data-source

I'm also gonna ask that we ship this as its own clean skills for now, because the actual task queue is a bit buggy right now, and we're looking to fix that soon.

Also excuse our mess in this repo, and props for pushing through some of our slop <3

@Gilbert09
Copy link
Copy Markdown
Member Author

Hey, thank you for adding this! this will help us match up with the onboarding flow even more!

What I'll ask is that you ship this as its own command, maybe instead of data-warehouse, call it @posthog/wizard data-source

I'm also gonna ask that we ship this as its own clean skills for now, because the actual task queue is a bit buggy right now, and we're looking to fix that soon.

Also excuse our mess in this repo, and props for pushing through some of our slop <3

@gewenyu99 When you say its own command, you mean something other than npx @posthog/wizard warehouse (minus the name change)? I thought this was under its own command. Happy to rename it regardless. Would this still run during the usual setup wizard?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants