diff --git a/.claude/skills/ai-observability-module.md b/.claude/skills/ai-observability-module.md new file mode 100644 index 0000000..d263d4d --- /dev/null +++ b/.claude/skills/ai-observability-module.md @@ -0,0 +1,119 @@ +--- +name: ai-observability-setup +description: Enable and configure the contrib ai_observability module for tracking AI token usage, prompts, and responses +triggers: + - enable observability + - track AI tokens + - AI logging + - ai observability +--- + +# AI Observability Setup + +Enables and configures the contrib `ai_observability` module (part of `drupal/ai`) for tracking all AI API calls, token usage, prompts, and responses. + +## What It Does + +The `ai_observability` module subscribes to the AI module's Symfony events and logs: +- Provider, model, and operation type for every AI call +- Token usage (input/output/total) from the API response +- Request duration and thread IDs (for tracing agent chains) +- Optionally: full input prompts and output responses +- OpenTelemetry spans and metrics (optional, for production) + +## When to Use + +- Auditing what Canvas AI agents are sending to Anthropic/OpenAI +- Tracking token costs per agent or per page build +- Debugging agent behavior by inspecting full prompts +- Setting up production monitoring with OpenTelemetry + +## Steps + +### 1. Enable the module + +```bash +ddev drush en ai_observability -y +``` + +### 2. Configure for audit mode (full logging) + +```bash +ddev drush config:set ai_observability.settings log_input true -y +ddev drush config:set ai_observability.settings log_output true -y +ddev drush config:set ai_observability.settings logging_enabled true -y +``` + +### 3. View logs + +```bash +# Watch AI events in real time +ddev drush watchdog:show --type=ai_observability --count=20 + +# Filter by severity +ddev drush watchdog:show --type=ai_observability --severity=info --count=50 +``` + +### 4. Export config for the recipe + +After enabling and configuring: +```bash +ddev drush config:export --destination=/tmp/config-check +cp /tmp/config-check/ai_observability.settings.yml custom_recipes/findrop/config/ +``` + +Then add `ai_observability` to the findrop recipe's install list in `custom_recipes/findrop/recipe.yml`. + +### 5. Recommended settings by environment + +**Development/Audit** (full visibility): +```yaml +logging_enabled: true +log_input: true +log_output: true +log_tags: {} +otel_enabled: false +``` + +**Demo** (lightweight): +```yaml +logging_enabled: true +log_input: false +log_output: false +log_tags: {} +otel_enabled: false +``` + +**Production** (OpenTelemetry): +```yaml +logging_enabled: false +otel_enabled: true +otel_spans: true +otel_spans_store_input: false +otel_spans_store_output: false +otel_metrics: true +``` + +## What the module logs + +Each AI API call produces a log entry with: +- `provider` — anthropic, openai, etc. +- `model` — claude-sonnet-4-6, text-embedding-3-small, etc. +- `operation_type` — chat, chat_with_tools, embeddings, etc. +- `token_usage.total` — total tokens consumed +- `token_usage.input` — input/prompt tokens +- `token_usage.output` — completion tokens +- `provider_request_id` — unique request thread ID +- `provider_request_parent_id` — parent request (for tracing nested agent calls) +- `input` — full prompt text (when log_input is true) +- `output` — full response text (when log_output is true) +- `tags` — contextual tags from the calling code + +## Relation to other logging + +- `ai.settings.prompt_logging` — The AI module's own prompt logging. Less structured. Prefer `ai_observability`. +- `ai_dashboard` — Operational status block, doesn't log individual calls. + +## Admin UI + +Settings form at: `/admin/config/ai/observability` diff --git a/.claude/skills/canvas-ai-audit.md b/.claude/skills/canvas-ai-audit.md new file mode 100644 index 0000000..2459e09 --- /dev/null +++ b/.claude/skills/canvas-ai-audit.md @@ -0,0 +1,405 @@ +--- +name: canvas-ai-audit +description: Runs the DrupalCon driesnote demo script as a repeatable Playwright-based audit of Canvas AI agents on the FinDrop demo site. Executes 8 steps, takes screenshots, and reports pass/fail for each. +triggers: + - "run demo test" + - "driesnote test" + - "canvas audit" + - "/canvas-ai-audit" +tools: + - mcp__playwright__browser_navigate + - mcp__playwright__browser_snapshot + - mcp__playwright__browser_take_screenshot + - mcp__playwright__browser_click + - mcp__playwright__browser_type + - mcp__playwright__browser_fill_form + - mcp__playwright__browser_wait_for + - mcp__playwright__browser_resize + - mcp__playwright__browser_press_key + - mcp__playwright__browser_evaluate + - mcp__playwright__browser_tabs + - Read + - Write +--- + +# Canvas AI Audit — DrupalCon Driesnote Demo Script + +You are executing a structured, repeatable audit of the Canvas AI agent pipeline on the FinDrop demo site. Work through the 8 steps below in order. After each step: take a screenshot, evaluate the pass/fail criteria, and record the result. At the end, print a summary table. + +--- + +## Prerequisites Check + +Before running any steps, verify the following. If any prerequisite is unmet, stop and report it clearly rather than proceeding. + +- DDEV is running: the site is reachable at `https://c2026.ddev.site` +- A one-time login URL is available (`ddev drush uli`) or admin credentials are known +- Playwright MCP is available and a browser window can be opened +- The browser viewport is at least 1440 x 900 + +**OpenAI key status** — Steps 02 and 04 require OpenAI embeddings for media search and cross-link indexing respectively. If the key is absent those steps degrade gracefully; note this at the start rather than treating degraded behavior as a failure. + +--- + +## Session Setup + +1. Open a new browser tab. +2. Resize the viewport to 1440 x 900 minimum. +3. Navigate to `https://c2026.ddev.site`. +4. Log in as an admin user (use the one-time login URL from `ddev drush uli` if needed). +5. Once logged in, navigate to **Content > Canvas Pages** and create a new blank Canvas page. Give it a working title such as `Audit - Travel Page YYYY-MM-DD`. +6. Confirm the Canvas editor opens with the AI chat sidebar visible on the right. +7. Take a screenshot labeled `00_editor_ready`. + +--- + +## Step 01.A — Paste Copy Deck, Request Page Creation + +**Prompt to type in the Canvas AI chat:** + +``` +Create this product page from the copy below: +``` + +Then paste the full contents of `ai_context_data/website_copy/travel-page-text-only-v2.md`. + +Read that file now with the Read tool and paste its contents after the prompt text. Do not summarize or truncate it — the AI agent must receive the complete copy deck. + +**After sending:** + +- Wait for the AI response (it may take 10–30 seconds). +- Take a screenshot labeled `01a_after_prompt`. + +**Pass criteria:** + +- The AI does NOT immediately build a page. +- The AI asks at least one preflight clarifying question. Expected questions are about **audience** and **goal** — specifically something like "Who is the target audience?" and "What is the primary goal of this page?" +- The AI should ask both questions before doing any Canvas work. + +**Fail criteria:** + +- AI builds a page immediately without asking any questions. +- AI asks questions unrelated to audience or goal. +- AI errors out or produces no response. + +Record result: `01.A PASS` or `01.A FAIL — [reason]`. + +--- + +## Step 01.B — Answer Preflight Questions + +**Prompt to type in the Canvas AI chat (reply to the AI's questions):** + +``` +Audience is Travel Managers +Goal is to get whitepaper downloads +``` + +**After sending:** + +- Wait for the AI to build the full page. This may take 30–90 seconds as it creates multiple components. +- Take a screenshot labeled `01b_page_built`. + +**Pass criteria:** + +- AI builds a complete multi-section page without asking any further questions. +- The page contains a visible hero, at least one feature section, and a CTA. +- AI provides a brief explanation of what it built and why. +- No errors appear in the chat or Canvas editor. + +**Fail criteria:** + +- AI asks additional clarifying questions instead of building. +- Page is incomplete (fewer than 3 distinct sections rendered). +- AI errors out. + +Record result: `01.B PASS` or `01.B FAIL — [reason]`. + +--- + +## Step 02 — Switch Hero to Photography with Cindy Liu + +**Prompt to type in the Canvas AI chat:** + +``` +Switch the hero to photography with Cindy Liu. +``` + +**After sending:** + +- Wait for the AI response (10–20 seconds). +- Take a screenshot labeled `02_hero_swap`. + +**Pass criteria (OpenAI key present):** + +- AI searches the media library for photography assets. +- AI swaps the hero image to a photography-style image. +- If a media item named or tagged "Cindy Liu" exists, it is selected. +- Alt text is updated for the new image. + +**Pass criteria (OpenAI key absent — graceful degradation):** + +- AI acknowledges it cannot search the media library due to missing embeddings or search index. +- AI explains the limitation clearly rather than silently failing or selecting a random image. +- AI offers a manual alternative (e.g., "You can select an image manually from the media library"). + +**Fail criteria:** + +- AI silently selects a wrong or irrelevant image without explanation. +- AI produces an error without any helpful guidance. +- Canvas editor crashes or becomes unresponsive. + +Record result: `02 PASS` or `02 PASS (degraded — no OpenAI key)` or `02 FAIL — [reason]`. + +--- + +## Step 03 — Create FAQ Block from Existing Content + +**Prompt to type in the Canvas AI chat:** + +``` +Use the content in section "Learn How We Make Travel Expense Management Easy" to write a new FAQ block above the CTA. Use the current content and rewrite the heading as questions. +``` + +**After sending:** + +- Wait for the AI to create and insert the FAQ block (10–30 seconds). +- Take a screenshot labeled `03_faq_block`. +- Scroll down to find the FAQ/accordion block and take a second screenshot labeled `03_faq_detail`. + +**Pass criteria:** + +- An accordion or FAQ component appears above the CTA section. +- Each accordion item uses a question format derived from the "Learn How We Make Travel Expense Management Easy" section headings. Expected questions include things like: + - "How does booking flexibility work across platforms?" + - "How does real-time policy enforcement work?" + - "What happens to trip cards after travel?" +- The accordion body text matches (or closely paraphrases) the original section body copy. +- The component is placed above the CTA, not at the bottom. + +**Fail criteria:** + +- No FAQ/accordion component is created. +- FAQ items do not use question format (headings left as statements). +- Component is placed in the wrong position. +- Content from a different section is used. + +Record result: `03 PASS` or `03 FAIL — [reason]`. + +--- + +## Step 04 — Add Internal Cross Links + +**Prompt to type in the Canvas AI chat:** + +``` +Review the page and add internal cross links +``` + +**After sending:** + +- Wait for the AI response (10–30 seconds). +- Take a screenshot labeled `04_cross_links`. + +**Pass criteria (search index available):** + +- AI searches the site index for relevant pages. +- AI inserts internal links to at least 2 other pages on the site (e.g., Virtual Cards page, Expense Management page, Integrations page). +- Links are placed contextually within existing copy, not appended as a list. + +**Pass criteria (embeddings/index unavailable — graceful degradation):** + +- AI explains it cannot search the index (e.g., missing Milvus index, embeddings not built). +- AI identifies candidate link targets based on content it knows about (from copy deck mentions of "Virtual credit cards →", "Expense management →", "See all integrations →"). +- AI offers to insert placeholder links or prompts the user to provide target URLs. + +**Fail criteria:** + +- AI inserts broken or fabricated URLs. +- AI silently does nothing without explanation. +- AI errors out. + +Record result: `04 PASS` or `04 PASS (degraded — no index)` or `04 FAIL — [reason]`. + +--- + +## Step 05 — Create AEO Schema + +**Prompt to type in the Canvas AI chat:** + +``` +Create an AEO schema for this page +``` + +**After sending:** + +- Wait for the AI to generate schema (10–20 seconds). +- Take a screenshot labeled `05_schema_generated`. +- Look for the structured data field in the Canvas editor (typically on the right-hand panel or a dedicated metadata tab). Take a second screenshot labeled `05_schema_field`. + +**Pass criteria:** + +- AI generates Schema.org JSON-LD structured data. +- The schema includes at least `FAQPage` type (drawn from the accordion created in Step 03). +- The schema includes `Product` or `WebPage` type drawn from the page content. +- The schema is placed in the structured data field in the Canvas editor, not just in the chat. +- AI confirms what it generated (e.g., "Done. I've generated FAQPage and Product schema based on the page content."). + +**Fail criteria:** + +- No schema is generated. +- Schema is only output in chat text and not applied to the page field. +- Schema is invalid JSON-LD (malformed, missing `@context` or `@type`). +- AI errors out. + +Record result: `05 PASS` or `05 FAIL — [reason]`. + +--- + +## Step 06 — GA Underperformance Diagnosis (New Session) + +**Before this step:** Simulate a new session by refreshing the browser (F5 or navigate away and back to the Canvas page). This represents a user returning to the page after time has passed. + +Navigate back to the Canvas page created in the previous steps and open it in the editor. Confirm the AI chat sidebar is empty (new session context). + +**Prompt to type in the Canvas AI chat:** + +``` +This page is underperforming against its Google Analytics goals. A summary of the page's performance is below. +- Not performing to bounce threshold. +Review the page layout and provide some suggestions to improve the failing metric(s). +``` + +**After sending:** + +- Wait for the AI response (10–20 seconds). +- Take a screenshot labeled `06_ga_diagnosis`. + +**Pass criteria:** + +- AI reviews the current page layout and provides specific, actionable suggestions for reducing bounce rate. +- Suggestions are grounded in the actual page structure (e.g., "The hero CTA is below the fold", "The FAQ section is too far down the page"). +- AI does not fabricate GA data — it acknowledges it received a summary of performance metrics rather than live data. +- Suggestions reference FinDrop-specific content and audience (Travel Managers, whitepaper goal). + +**Fail criteria:** + +- AI provides generic SEO advice unrelated to the page content. +- AI claims to have accessed real GA data when none was provided. +- AI asks for more information without offering any initial diagnosis. +- AI errors out. + +Record result: `06 PASS` or `06 FAIL — [reason]`. + +--- + +## Step 07 — Edit CTA with Competitor Name (Brand Compliance Catch) + +**Before this step:** Manually edit the CTA title field directly in the Canvas editor (not via AI chat). Set the CTA title text to: + +``` +Go live in 10 business days, not 6+ months like with SAQ +``` + +To do this: click the CTA component in the Canvas editor, find the title field, and type or paste that text directly. Take a screenshot labeled `07_manual_edit`. + +**After the manual edit, type in the Canvas AI chat:** + +``` +please review before I publish live +``` + +**After sending:** + +- Wait for the AI response (10–20 seconds). +- Take a screenshot labeled `07_compliance_catch` (or `08_review` — this step flows directly into Step 08). + +**Pass criteria:** + +- AI flags the competitor name "SAQ" in the CTA title as a brand guideline violation. +- The flag is proactive — AI surfaces it as part of the review, not only if asked directly. +- AI offers at least one of: (a) a rephrased version without the competitor name, or (b) a prompt to confirm with legal before publishing. +- The violation is identified before publish, not after. + +**Fail criteria:** + +- AI does not mention "SAQ" or the competitive claim. +- AI approves the page for publish without flagging the competitor name. +- AI only flags after being explicitly asked about brand compliance. + +Record result: `07 PASS` or `07 FAIL — [reason]`. + +--- + +## Step 08 — Review Before Publish + +This step may overlap with Step 07 if the AI already began a review. If Step 07's response included a full pre-publish review, evaluate it here. Otherwise send the prompt: + +``` +please review before I publish live +``` + +**After sending:** + +- Take a screenshot labeled `08_publish_review`. + +**Pass criteria:** + +- AI performs a structured pre-publish review covering at least: + - Brand compliance (tone, naming conventions, competitor mentions) + - Content completeness (all sections present, no placeholder text) + - CTA alignment with stated goal (whitepaper download) + - Schema/structured data status +- AI surfaces the "SAQ" competitor name if it was not already caught in Step 07. +- AI either approves the page (with any caveats noted) or lists specific items that must be resolved before publishing. +- Review is actionable — not a generic checklist. + +**Fail criteria:** + +- AI approves without reviewing content. +- AI misses the competitor name "SAQ" if it was not caught in Step 07. +- Review is generic and not grounded in the actual page content. +- AI errors out. + +Record result: `08 PASS` or `08 FAIL — [reason]`. + +--- + +## Results Summary + +After completing all steps, output a results table in this format: + +``` +| Step | Description | Result | +|-------|--------------------------------------|-------------------------------| +| 01.A | Paste copy deck → preflight question | PASS / FAIL | +| 01.B | Answer questions → full page built | PASS / FAIL | +| 02 | Switch hero to photography | PASS / PASS (degraded) / FAIL | +| 03 | Create FAQ block from content | PASS / FAIL | +| 04 | Add internal cross links | PASS / PASS (degraded) / FAIL | +| 05 | Create AEO schema | PASS / FAIL | +| 06 | GA underperformance diagnosis | PASS / FAIL | +| 07 | Competitor name caught (SAQ) | PASS / FAIL | +| 08 | Pre-publish review | PASS / FAIL | +``` + +Below the table, note: +- Which steps degraded gracefully vs. fully passed +- Any steps that were skipped and why +- Screenshot filenames for each step +- Overall verdict: **DEMO READY** (all steps pass or degrade gracefully) or **NEEDS ATTENTION** (any hard failures) + +--- + +## Screenshot Storage + +Save all screenshots to `.omc/audit-screenshots/canvas-ai-audit-YYYY-MM-DD/` using the labels defined in each step. If that path does not exist, create it before saving. + +--- + +## Notes on Demo Fidelity + +- Steps 01.A and 01.B together form the core demo moment: the AI asking intelligent preflight questions (not over-asking, not under-asking) then building a complete page. This is the highest-signal step. +- Step 07 (competitor name catch) is the highest-stakes brand safety demo moment. A miss here is a hard failure regardless of other results. +- Steps 02 and 04 are OpenAI-dependent. Graceful degradation with a clear explanation is a valid pass state. +- The copy deck lives at `ai_context_data/website_copy/travel-page-text-only-v2.md` — always read it fresh rather than relying on memory. diff --git a/.claude/skills/canvas-webapp-testing.md b/.claude/skills/canvas-webapp-testing.md new file mode 100644 index 0000000..50c0048 --- /dev/null +++ b/.claude/skills/canvas-webapp-testing.md @@ -0,0 +1,85 @@ +--- +name: canvas-webapp-testing +description: Playwright testing configuration for Canvas AI chatbot interactions on the FinDrop demo site +triggers: + - test canvas + - test ai chatbot + - playwright canvas + - test demo +--- + +# Canvas Webapp Testing + +Configures Playwright-based testing for the FinDrop Canvas AI chatbot. Use `webapp-testing` skill with these project-specific patterns. + +## Prerequisites + +- DDEV running: `ddev start` +- Site installed: `ddev demo-setup` or equivalent +- Playwright MCP available +- Browser viewport: 1440x900 minimum (Canvas requires >= 1024px wide) + +## Getting a Session + +```bash +# Get a fresh login URL +ddev drush uli --uri=https://c2026.ddev.site + +# Navigate to it via Playwright, then go to Canvas editor +``` + +## Canvas Editor URLs + +| URL | Purpose | +|-----|---------| +| `/canvas/editor/canvas_page/{id}` | Edit existing Canvas page | +| `/admin/content/pages` | List all Canvas pages | + +To create a new test page: +```bash +ddev drush php:eval "\$p = \Drupal::entityTypeManager()->getStorage('canvas_page')->create(['title' => 'Test Page']); \$p->save(); echo \$p->id();" +``` + +## AI Panel Interaction Pattern + +1. **Open AI Panel**: Click button with name "Open AI Panel" in the top toolbar +2. **Type prompt**: Fill the textbox with placeholder "Build me a ..." +3. **Submit**: Press Enter +4. **Wait**: The AI shows "Thinking" or agent-specific status ("Designing the page", "Drupal Canvas SEO Agent working") +5. **Check result**: Wait for "Thinking" to disappear, then snapshot or screenshot + +## Key Selectors + +- AI Panel toggle: `button[name="Open AI Panel"]` or `button[name="Close AI Panel"]` +- Chat input: `textbox[name="Build me a ..."]` +- Status indicators: Text content "Thinking", "Designing the page", "Finding components to place", "Drupal Canvas SEO Agent working" + +## Canvas API Endpoints (for network interception) + +| Endpoint | Method | Purpose | +|----------|--------|---------| +| `/admin/api/canvas/token` | POST | CSRF token | +| `/admin/api/canvas/ai` | POST | Send AI prompt | +| `/admin/api/canvas/ai-progress` | GET | Poll agent progress | +| `/canvas/api/v0/layout/{type}/{id}` | PATCH | Apply layout changes | + +## Checking Results + +After AI interactions: +```bash +# Check prompt logs +ddev drush watchdog:show --type=ai --count=20 + +# Check if Schema.org was generated +ddev drush php:eval "\$p = \Drupal::entityTypeManager()->getStorage('canvas_page')->load({ID}); echo \$p->get('schema_jsonld')->value;" + +# Check page title was set +ddev drush php:eval "\$p = \Drupal::entityTypeManager()->getStorage('canvas_page')->load({ID}); echo \$p->label();" +``` + +## Common Issues + +- **"Browser window too narrow"**: Resize to 1440x900 before navigating to Canvas editor +- **Refs go stale**: After AI completes, snapshot refs change. Re-navigate or take a fresh snapshot. +- **Media search fails**: Requires OpenAI key for embeddings. Steps involving image search degrade gracefully. +- **Deep-chat shadow DOM**: The AI chat input is rendered by a deep-chat web component. Playwright's accessibility snapshot can see it, but `document.querySelector` cannot without traversing shadow roots. diff --git a/.gitignore b/.gitignore index 8d86021..2b70ce0 100644 --- a/.gitignore +++ b/.gitignore @@ -94,3 +94,4 @@ $RECYCLE.BIN/ # Local backups /.backups/ + diff --git a/.omc/handoff-codex-embeddings.md b/.omc/handoff-codex-embeddings.md new file mode 100644 index 0000000..82cedb0 --- /dev/null +++ b/.omc/handoff-codex-embeddings.md @@ -0,0 +1,72 @@ +# Handoff: Embedding Indexing Setup + +## Context + +FinDrop is a Drupal CMS 2.0 demo site (Drupal 11.3) at `/Users/AlexUA/claude/c2026`. It uses AI agents (Anthropic for chat, OpenAI for embeddings) with Milvus as the vector database. + +The site is fully installed and running via DDEV. Everything works except **content indexing** — the search indexes need OpenAI embeddings (`text-embedding-3-small`) to populate the Milvus vector DB. + +## What's Running + +- **DDEV**: `c2026.ddev.site` (MariaDB 10.11, PHP 8.3, nginx) +- **Milvus 2.5**: etcd + MinIO + Milvus (port 19530) + Attu UI (port 8521) +- **Drupal**: Installed from recipes, all modules enabled, Canvas UI built +- **Anthropic key**: Set in `.ddev/.env` +- **OpenAI key**: **NOT SET** — this is the blocker + +## What Needs to Happen + +### 1. Set the OpenAI API Key + +Edit `.ddev/.env` and set `OPENAI_API_KEY`: +``` +OPENAI_API_KEY="sk-..." +``` + +Then restart DDEV to pick up the env var: +```bash +ddev restart +``` + +### 2. Verify the Key is Loaded + +```bash +ddev drush php:eval "echo \Drupal::service('key.repository')->getKey('openai_api_key')->getKeyValue() ? 'KEY SET' : 'KEY MISSING';" +``` + +### 3. Index Content in Milvus + +```bash +ddev drush sapi-i +``` + +This calls the OpenAI embeddings API (`text-embedding-3-small`) for every piece of content and stores vectors in Milvus. There are 3 search indexes: +- `canvas_page_search_index` — Canvas page content +- `content` — General content (nodes) +- `media_image_index_rag` — Media images (used by the page builder's RAG image search) + +### 4. Verify Indexing + +```bash +ddev drush sapi-s +``` + +All indexes should show items indexed. + +## Token Cost Estimate + +The demo site has limited content (installed from recipes). Embedding calls use `text-embedding-3-small` which costs $0.02/1M tokens. Expected cost for initial indexing: **< $0.10**. + +## Architecture Notes + +- AI provider config: `custom_recipes/findrop/config/ai_provider_openai.settings.yml` +- The `host` field is empty (uses default `https://api.openai.com/v1`) +- Prompt logging is enabled via `web/sites/default/settings.local.php` +- The AI module fires `PreGenerateResponseEvent` / `PostGenerateResponseEvent` Symfony events on every API call — these can be used for observability +- Milvus UI (Attu) is at `http://c2026.ddev.site:8521` for inspecting vectors + +## What NOT to Do + +- Don't modify files under `web/core/`, `web/modules/contrib/`, or `vendor/` +- Don't commit the `.ddev/.env` file (it's gitignored) +- Don't change the AI provider config in the recipe — the `host` field being empty is correct for direct API access diff --git a/.omc/handoff-next-session.md b/.omc/handoff-next-session.md new file mode 100644 index 0000000..b1c47f2 --- /dev/null +++ b/.omc/handoff-next-session.md @@ -0,0 +1,84 @@ +# Handoff: Session 4 (Final) + +**Date:** 2026-03-27 +**Current branch:** `feat/ws1-efficiency-optimization` (branched from `feat/add-claude-md`) +**Parent PR:** fosterinteractive/c2026#1 (FROZEN) +**Site:** Running at https://c2026.ddev.site via DDEV + +## What Was Delivered + +### Working code +1. **`canvas_ai_scoping` module** (`web/modules/custom/canvas_ai_scoping/`) + - `LayoutScopingSubscriber` — section-level layout scoping via BuildSystemPromptEvent (79% layout reduction, TESTED AND WORKING) + - `ContextScopingSubscriber` — ai_context item stripping during edits (WRITTEN BUT NOT FIRING — needs separator format debugging) + +2. **Config changes** (all in `custom_recipes/`) + - Orchestrator examples: 24 → 13 + - page_builder max_loops: 30 → 15 + - template_builder max_loops: 10 → 8, available_on_loop on both tools + - SEO agent max_loops: 10 → 5 + - Sales Training Deck removed from always_include (recipe only — needs demo-setup to apply) + - Module added to recipe install list + +3. **Documents** + - `docs/proposals/canvas-ai-region-scoping.md` — Foster Interactive proposal + - `.omc/plans/token-reduction-remaining-levers.md` — Revised plan per meta-critic + +### Measurement results + +| Scenario | Tokens | Calls | Notes | +|----------|--------|-------|-------| +| Baseline (page build, pre-optimization) | 253,593 | 10 | Original measurement | +| Phase A (page build, config changes) | 259,649 | 12 | No improvement for builds | +| Phase B1 (edit, region scoping) | 125,607 | 5 | 13% layout reduction | +| Phase B2 (edit, section scoping) | 111,004 | 5 | 79% layout reduction | +| Phase B3 (edit, section + context strip attempt) | 108,839 | 5 | Context strip didn't fire | + +## What Needs Doing Next Session + +### Immediate: Fix ContextScopingSubscriber +The subscriber doesn't fire — most likely the `-----------------------------------------------` separator doesn't match what ai_context actually renders. Debug by: +1. Enable ai_observability `log_input: true` to capture the full system prompt +2. Check the actual separator/format in the logged prompt +3. Fix the string matching in `ContextScopingSubscriber` + +This is the highest-leverage remaining item — stripping 4 context items (Content Structure Product Pages at 29KB alone) should save 10-20K tokens per edit. + +### Immediate: Apply Sales Training Deck removal +Run `ddev demo-setup` to apply the recipe change, or update active config via drush. + +### Commit all changes +Everything is working but uncommitted. Remove the `\Drupal::logger()` debug calls (or convert to debug-level) before committing. + +### Upstream proposals to write/file +1. **ai_context module**: Operation-type-aware context loading (tag items as "build"/"edit"/"all") +2. **ai_agents module**: Chat history windowing (`max_history_messages` config) +3. **Canvas module**: Native region scoping (proposal already written at `docs/proposals/canvas-ai-region-scoping.md`) +4. **Canvas module**: Lightweight edit path (skip LLM for simple prop changes) + +## Key Findings (preserve for future sessions) + +1. **`available_on_loop` doesn't save tokens** — it moves data between system prompt and chat history but total per-call tokens are identical +2. **Config-only changes (prompt trim, loop caps) don't meaningfully help** — measured 259K vs 253K baseline +3. **Section-level layout scoping works** — 79% layout reduction, but layout is only ~10-15% of per-call cost +4. **The dominant costs are system prompt + ai_context items** — ~16-20K per call that can't be reduced without either stripping content or framework changes +5. **111K tokens for a heading change is structural** — the agent architecture requires multiple LLM round-trips with full context per trip +6. **`return_directly: 1` breaks title/metadata generation** — can't be safely enabled (meta-critic finding) +7. **Workflow A collapsing is unsafe** — `active_component_uuid` is present for both edits AND add-relative-to-selection (meta-critic finding) + +## Environment State +- DDEV running, canvas_ai_scoping enabled +- Anthropic key set, OpenAI key NOT set +- ai_observability enabled +- canvas_page/10 (Home): heading changed to "Take Control of Every Dollar" (unsaved, in tempstore) +- Recipe changes NOT applied to active config (need demo-setup) + +## Decisions Made (All Sessions) +- Drupal Forge deployment is in scope +- LiteLLM banned (supply chain compromise March 2026) +- Component agent JS generation: BLOCKING FOR PRODUCTION +- "Human review gate" for AI-generated component code: MANDATORY +- Token budget per request: needs product lead input +- Layout scoping works but is insufficient alone +- Context stripping is the next highest-leverage lever +- Upstream proposals needed for structural improvements (ai_agents history windowing, ai_context operation scoping, Canvas lightweight edit path) diff --git a/.omc/plans/canvas-agent-static-audit.md b/.omc/plans/canvas-agent-static-audit.md new file mode 100644 index 0000000..5c59145 --- /dev/null +++ b/.omc/plans/canvas-agent-static-audit.md @@ -0,0 +1,178 @@ +# Canvas AI Agent Chain — Static Audit Report + +**Date:** 2026-03-26 +**Status:** Complete (Phase 1 — zero tokens spent) +**Scope:** All 12 AI agents, context items, function call plugins, test scenarios + +--- + +## 1. Agent Orchestration Map + +``` +canvas_ai_orchestrator (max_loops: 10) + ├── canvas_template_builder_agent (max_loops: 10) + │ Tools: set_template_data, get_metadata_of_components, rag_search + ├── canvas_page_builder_agent (max_loops: 30) + │ Tools: set_component_structure, update_component_data, get_component_content, + │ get_metadata_of_components, move_component_in_page, rag_search + ├── canvas_component_agent (max_loops: 10, triage: true) + │ Tools: edit_component_js, create_component, get_props_type, + │ get_js_component, get_node_fields + ├── canvas_title_generation_agent (max_loops: 5) + │ Tools: create_field_content, edit_field_content + ├── canvas_metadata_generation_agent (max_loops: 5) + │ Tools: add_metadata + └── drupal_canvas_seo_agent (max_loops: 10) + Tools: add_schema_org_json, rag_search, get_component_content, + get_linkable_components + └── canvas_page_builder_agent (sub-call for link insertion) + +drupal_cms_assistant (max_loops: 10, separate orchestrator) + ├── content_type_agent_triage (max_loops: 3, triage: true) + ├── field_agent_triage (max_loops: 15, triage: true) + └── taxonomy_agent_config (max_loops: 10, triage: true) + +analytics_monitoring_agent (max_loops: 3, standalone) + Tools: get_relevant_context_items +``` + +### Critical Path (Canvas Page Build) +1. User request → `canvas_ai_orchestrator` +2. Orchestrator validates entity type (must be `canvas_page`) +3. Delegates to `canvas_template_builder_agent` (new page) OR `canvas_page_builder_agent` (edits) +4. In parallel (if title/description empty): `canvas_title_generation_agent` + `canvas_metadata_generation_agent` +5. Sub-agents loop internally (metadata retrieval, RAG image search, component placement) +6. Orchestrator collects responses, surfaces questions or confirms completion + +### Recursion Risks + +| Agent | max_loops | Risk | Notes | +|-------|-----------|------|-------| +| canvas_page_builder_agent | **30** | **HIGH** | Highest in the chain. 3 retries per image search. | +| drupal_canvas_seo_agent → page_builder | 10 × 30 | **HIGH** | Nested chain: worst case 300 effective loops | +| canvas_ai_orchestrator → page_builder | 10 × 30 | **HIGH** | Same nesting pattern | +| field_agent_triage | 15 | MEDIUM | High for a triage agent | +| analytics_monitoring_agent | 3 | LOW | Appropriately constrained | + +--- + +## 2. System Prompt Quality + +### canvas_ai_orchestrator — **CLEAR** +- ~4,500 tokens. Expert PM persona with 24 worked examples. +- **Issues:** Duplicate Rule #8 (two different rules share the number), Rule #7 missing from sequence, no explicit error handling for sub-agent failures. + +### canvas_page_builder_agent — **CLEAR** +- ~3,200 tokens + dynamic context (layout JSON, component catalog). +- **Issues:** max_loops:30 with "retry until all succeed" and no upper retry bound. No guidance for component-not-found scenarios. + +### canvas_template_builder_agent — **CLEAR** +- ~2,000 tokens. Generates 5+ section templates. +- **Issues:** "Creative Expansion" instruction is a mild hallucination risk. No defense-in-depth on preflight questions (relies on orchestrator). + +### canvas_component_agent — **CLEAR BUT COMPLEX** +- ~4,000 tokens. Generates React/Preact code. +- **Issues:** **Highest security risk agent** — generates browser-executable JS with no XSS prevention rules, no CSP guidance, no `eval()` restrictions. + +### canvas_title_generation_agent — **INCOMPLETE** +- **~50 tokens.** 3-line prompt. No length constraints, no brand voice, no naming conventions. +- **CRITICAL: Receives ZERO context items.** Not listed in ai_context_setup recipe at all. + +### canvas_metadata_generation_agent — **VAGUE** +- ~500 tokens. Has 160-char limit but thin otherwise. +- **CRITICAL: Also receives ZERO context items.** + +### drupal_canvas_seo_agent — **CLEAR** +- ~3,000 tokens. Excellent good/bad prompt examples. +- **Issues:** Calls page_builder as sub-agent (deepest nesting). Also receives zero context items. + +### analytics_monitoring_agent — **CLEAR** +- ~300 tokens. Simple, focused, appropriate scope. +- **Issue:** structured_output_enabled: false despite having a JSON schema defined. + +--- + +## 3. Red Flags + +### CRITICAL + +1. **XSS in Schema.org JSON-LD injection.** `CanvasAiSeoHooks.php:62-67` injects LLM-generated JSON-LD directly into a `` would execute arbitrary JS. + +2. **Hardcoded credentials filename.** `GoogleAnalytics.php:43` contains `putenv('GOOGLE_APPLICATION_CREDENTIALS=/var/www/html/web/sites/default/files/ai-integration-480315-c136045bcc0e.json')` — dead code but exposes the creds filename in source control. + +3. **Title and metadata agents have ZERO brand context.** These agents generate the most visible SEO content (search result titles/descriptions) with no brand guidelines, naming conventions, or approved vocabulary. + +4. **Competitor names in page builder context.** The Sales Training Deck (always injected into both page builders) contains "Rimp," "Brix," "SAQ Concur," "Navex," "Dill/Bivvy." Brand guidelines prohibit these in external content, but having them in context is a known hallucination trap. + +### HIGH + +5. **Hardcoded GA date range.** `GoogleAnalytics.php:63-66` hardcodes `end_date: 2026-03-09`. Already stale (today is March 26). + +6. **max_loops:30 with unbounded retry.** Page builder prompt says "Retry... Continue until all succeed." No retry ceiling means burning all 30 loops on a persistently failing tool. + +7. **Nested agent calls with no cost ceiling.** SEO → Page Builder (30 loops) multiple times within SEO's 10-loop budget. No aggregate token limit. + +### MEDIUM + +8. **"Vibe coded method"** in `GetLinkableComponents.php:127` — self-documented as AI-generated without thorough review. +9. **GoogleAnalytics.php uses static `\Drupal::` calls** — untestable, violates coding standards. +10. **Uninitialized `$output` variable** in GoogleAnalytics.php if no GA rows returned. +11. **Test scenarios reference wrong agent/tool IDs** — tests are currently unrunnable. + +--- + +## 4. Context Injection Analysis + +| Agent | Context Items | Token Cost | Assessment | +|-------|--------------|------------|------------| +| orchestrator | 2 items (guidelines, brand) | ~1,200 | **Good** — lightweight | +| template_builder | 8 items (full brand + content structure) | ~10,000-12,000 | **Excessive** — includes internal sales deck with competitor names | +| page_builder | 8 items (same as template) | ~10,000-12,000 | **Same concern** | +| title_generation | **NONE** | ~50 | **CRITICAL GAP** | +| metadata_generation | **NONE** | ~500 | **CRITICAL GAP** | +| seo_agent | **NONE** | ~3,000 | Moderate gap | +| analytics_monitoring | 1 item (GA benchmarks) | ~300 | **Well configured** | + +### Wasted Context +- Sales Training Deck (~2,500 tokens) in page builders: contains competitor names, discovery questions, demo flow — mostly irrelevant to page building and dangerous. + +### Missing Context +- Title agent: needs Brand Guidelines + Key Facts at minimum +- Metadata agent: same +- SEO agent: could benefit from Key Facts for Schema.org property values + +--- + +## 5. Test Scenario Coverage + +**27 tests across 7 phases.** Covers: happy path page builds, degraded input, SEO, analytics, compliance. + +### Missing Coverage +- Zero tests for: entity type validation (Rule #1), component agent (code gen), title agent, metadata agent, error recovery, nested agent calls, brand compliance (competitor name leakage), parallel execution, selected component flow + +### Test Quality Issues +- Agent IDs don't match config (`canvas_ai_assistant` vs `canvas_ai_orchestrator`) +- Tool IDs don't match (`ai_agents::canvas::generate_page` vs actual tool names) +- Tests are currently unrunnable without remapping + +--- + +## 6. Recommendations (Prioritized) + +### Must Fix Before Demo +1. Sanitize JSON-LD before `` would execute arbitrary JS. + +2. **Hardcoded credentials filename.** `GoogleAnalytics.php:43` contains `putenv('GOOGLE_APPLICATION_CREDENTIALS=/var/www/html/web/sites/default/files/ai-integration-480315-c136045bcc0e.json')` — dead code but exposes the creds filename in source control. + +3. **Title and metadata agents have ZERO brand context.** These agents generate the most visible SEO content (search result titles/descriptions) with no brand guidelines, naming conventions, or approved vocabulary. + +4. **Competitor names in page builder context.** The Sales Training Deck (always injected into both page builders) contains "Rimp," "Brix," "SAQ Concur," "Navex," "Dill/Bivvy." Brand guidelines prohibit these in external content, but having them in context is a known hallucination trap. + +### HIGH + +5. **Hardcoded GA date range.** `GoogleAnalytics.php:63-66` hardcodes `end_date: 2026-03-09`. Already stale (today is March 26). + +6. **max_loops:30 with unbounded retry.** Page builder prompt says "Retry... Continue until all succeed." No retry ceiling means burning all 30 loops on a persistently failing tool. + +7. **Nested agent calls with no cost ceiling.** SEO → Page Builder (30 loops) multiple times within SEO's 10-loop budget. No aggregate token limit. + +### MEDIUM + +8. **"Vibe coded method"** in `GetLinkableComponents.php:127` — self-documented as AI-generated without thorough review. +9. **GoogleAnalytics.php uses static `\Drupal::` calls** — untestable, violates coding standards. +10. **Uninitialized `$output` variable** in GoogleAnalytics.php if no GA rows returned. +11. **Test scenarios reference wrong agent/tool IDs** — tests are currently unrunnable. + +--- + +## 4. Context Injection Analysis + +| Agent | Context Items | Token Cost | Assessment | +|-------|--------------|------------|------------| +| orchestrator | 2 items (guidelines, brand) | ~1,200 | **Good** — lightweight | +| template_builder | 8 items (full brand + content structure) | ~10,000-12,000 | **Excessive** — includes internal sales deck with competitor names | +| page_builder | 8 items (same as template) | ~10,000-12,000 | **Same concern** | +| title_generation | **NONE** | ~50 | **CRITICAL GAP** | +| metadata_generation | **NONE** | ~500 | **CRITICAL GAP** | +| seo_agent | **NONE** | ~3,000 | Moderate gap | +| analytics_monitoring | 1 item (GA benchmarks) | ~300 | **Well configured** | + +### Wasted Context +- Sales Training Deck (~2,500 tokens) in page builders: contains competitor names, discovery questions, demo flow — mostly irrelevant to page building and dangerous. + +### Missing Context +- Title agent: needs Brand Guidelines + Key Facts at minimum +- Metadata agent: same +- SEO agent: could benefit from Key Facts for Schema.org property values + +--- + +## 5. Test Scenario Coverage + +**27 tests across 7 phases.** Covers: happy path page builds, degraded input, SEO, analytics, compliance. + +### Missing Coverage +- Zero tests for: entity type validation (Rule #1), component agent (code gen), title agent, metadata agent, error recovery, nested agent calls, brand compliance (competitor name leakage), parallel execution, selected component flow + +### Test Quality Issues +- Agent IDs don't match config (`canvas_ai_assistant` vs `canvas_ai_orchestrator`) +- Tool IDs don't match (`ai_agents::canvas::generate_page` vs actual tool names) +- Tests are currently unrunnable without remapping + +--- + +## 6. Recommendations (Prioritized) + +### Must Fix Before Demo +1. Sanitize JSON-LD before ` + + +
+