Skip to content

ENG-1602: Add PDF extraction API route#937

Open
sid597 wants to merge 6 commits intomainfrom
eng-1602-api-to-send-uploaded-pdf-to-chosen-llms-to-parse-from-fr1
Open

ENG-1602: Add PDF extraction API route#937
sid597 wants to merge 6 commits intomainfrom
eng-1602-api-to-send-uploaded-pdf-to-chosen-llms-to-parse-from-fr1

Conversation

@sid597
Copy link
Copy Markdown
Collaborator

@sid597 sid597 commented Apr 2, 2026

https://www.loom.com/share/b6fc57a6040c41dabf4155f61bcb2df0


Open with Devin

Summary by CodeRabbit

Release Notes

New Features

  • Added AI-powered PDF extraction API supporting Anthropic, OpenAI, and Gemini providers
  • Supports custom system prompts and research-focused document analysis with structured JSON output
  • Includes request validation and comprehensive error handling for robust operation

Multi-provider (Anthropic, OpenAI, Gemini) endpoint for extracting
discourse graph nodes from uploaded PDFs.
@linear
Copy link
Copy Markdown

linear bot commented Apr 2, 2026

@supabase
Copy link
Copy Markdown

supabase bot commented Apr 2, 2026

This pull request has been ignored for the connected project zytfjzqyijgagqxrzbmz because there are no changes detected in packages/database/supabase directory. You can change this behaviour in Project Integrations Settings ↗︎.


Preview Branches by Supabase.
Learn more about Supabase Branching ↗︎.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

Widen Message.content to support multimodal content blocks and add
systemPrompt/responseMimeType to Settings. Each provider's
formatRequestBody now handles both text-only chat and PDF extraction,
eliminating the parallel PROVIDERS block in the extraction route.

OpenAI extraction switches from Responses API to Chat Completions
(now supports PDF). Gemini field casing fixed to match REST API docs.
devin-ai-integration[bot]

This comment was marked as resolved.

sid597 added 2 commits April 3, 2026 11:09
Add structured output enforcement via each provider's native mechanism:
Anthropic output_config, OpenAI response_format with strict mode,
Gemini responseJsonSchema. Removes prompt-based JSON instructions and
response cleanup parsing since constrained decoding guarantees valid JSON.
Per AGENTS.md: functions with more than 2 parameters use named
parameters via object destructuring.
devin-ai-integration[bot]

This comment was marked as resolved.

@sid597
Copy link
Copy Markdown
Collaborator Author

sid597 commented Apr 3, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

✅ Actions performed

Full review triggered.

Inline disables for response_format, json_schema (OpenAI), and
output_config (Anthropic) — external API contract names.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

This PR introduces a PDF-to-AI extraction feature with a new POST API endpoint that accepts PDF data, selects LLM providers (Anthropic, OpenAI, Gemini), and extracts structured discourse graph nodes. It includes type definitions, prompt templates, response parsing logic, and updates to provider implementations to support system prompts and JSON output schemas.

Changes

Cohort / File(s) Summary
AI Extraction Route
apps/website/app/api/ai/extract/route.ts
New POST handler (203 lines) implementing PDF extraction with provider-specific message formatting, timeout enforcement, comprehensive error handling with distinct status codes (502/500), and response parsing via parseExtractionResponse.
Type Definitions & Schemas
apps/website/app/types/extraction.ts, apps/website/app/types/llm.ts
Added ExtractionRequest, ExtractionResult, ExtractedNode schemas and types; introduced ProviderId union; extended Message.content to support ContentBlock[]; extended Settings with optional systemPrompt and outputSchema.
Prompts & Parsing
apps/website/app/prompts/extraction.ts, apps/website/app/utils/ai/parseExtractionResponse.ts
Added DEFAULT_EXTRACTION_PROMPT with node-type instructions and constraints; buildUserPrompt() helper for optional research question injection; parseExtractionResponse() utility for JSON parsing and schema validation.
LLM Provider Updates
apps/website/app/utils/llm/providers.ts
Modified OpenAI, Gemini, and Anthropic config logic to: prepend system messages when systemPrompt provided; support structured JSON output via outputSchema; transform message content handling for non-string blocks.

Sequence Diagram

sequenceDiagram
    participant Client
    participant APIRoute as API Route<br/>(extract/route.ts)
    participant ProviderConfig as Provider Config<br/>(providers.ts)
    participant LLMProvider as LLM Provider<br/>(Anthropic/OpenAI/Gemini)
    participant ResponseParser as Parser<br/>(parseExtractionResponse)

    Client->>APIRoute: POST with PDF, provider, model
    APIRoute->>APIRoute: Validate request<br/>against schema
    APIRoute->>APIRoute: Read provider API key<br/>from env
    APIRoute->>ProviderConfig: Build messages &<br/>settings with<br/>systemPrompt/outputSchema
    ProviderConfig->>ProviderConfig: Format provider-specific<br/>request payload
    ProviderConfig-->>APIRoute: Return formatted<br/>request config
    APIRoute->>LLMProvider: POST with timeout<br/>signal
    LLMProvider-->>APIRoute: Response text<br/>(or error)
    APIRoute->>ResponseParser: Parse extracted<br/>content JSON
    ResponseParser->>ResponseParser: Validate against<br/>ExtractionResultSchema
    ResponseParser-->>APIRoute: Typed ExtractionResult
    APIRoute-->>Client: { success, data }<br/>or { success, error }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Move llm-api endpoints to vercel serverless #102 — Both PRs directly modify provider implementations in providers.ts; the retrieved PR establishes initial provider scaffolding while this PR extends it with system prompts and output schema handling.
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title directly matches the main change: adding a PDF extraction API route at apps/website/app/api/ai/extract/route.ts.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
apps/website/app/utils/llm/providers.ts (1)

20-29: Add eslint-disable comments for API-required snake_case properties.

The response_format and json_schema properties are mandated by the OpenAI API. Consider adding inline eslint-disable comments to silence the warnings and document why.

🔧 Proposed fix
     ...(settings.outputSchema && {
+      // eslint-disable-next-line `@typescript-eslint/naming-convention`
       response_format: {
         type: "json_schema",
+        // eslint-disable-next-line `@typescript-eslint/naming-convention`
         json_schema: {
           name: "extraction_result",
           strict: true,
           schema: settings.outputSchema,
         },
       },
     }),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/website/app/utils/llm/providers.ts` around lines 20 - 29, The OpenAI API
requires snake_case keys `response_format` and `json_schema` inside the object
created when `settings.outputSchema` is present; add inline eslint-disable
comments (e.g., // eslint-disable-next-line
`@typescript-eslint/naming-convention`) immediately above or inline with those
properties to suppress naming-convention errors and include a short comment
noting “required by OpenAI API” to document why the rule is disabled; update the
object around `settings.outputSchema`, `response_format`, `json_schema`, and the
`name: "extraction_result"` entry accordingly.
apps/website/app/types/extraction.ts (1)

35-55: Consider generating JSON schema from Zod to prevent drift.

The EXTRACTION_RESULT_JSON_SCHEMA manually mirrors ExtractionResultSchema. If these diverge, the LLM's structured output may not match runtime validation. Libraries like zod-to-json-schema can generate one from the other.

#!/bin/bash
# Check if zod-to-json-schema is already a dependency
rg -l "zod-to-json-schema" apps/website/package.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/website/app/types/extraction.ts` around lines 35 - 55, The JSON schema
constant EXTRACTION_RESULT_JSON_SCHEMA is manually duplicated and can drift from
the Zod schema (ExtractionResultSchema); replace the hand-written
EXTRACTION_RESULT_JSON_SCHEMA with a generated schema by using
zod-to-json-schema (or equivalent) to convert ExtractionResultSchema into JSON
Schema at build/runtime, update imports so ExtractionResultSchema is the sole
source of truth, and export the generated schema where
EXTRACTION_RESULT_JSON_SCHEMA was used so consumers (LLM structured output
validation) always get the schema derived from the Zod definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/website/app/types/extraction.ts`:
- Around line 35-55: The JSON schema constant EXTRACTION_RESULT_JSON_SCHEMA is
manually duplicated and can drift from the Zod schema (ExtractionResultSchema);
replace the hand-written EXTRACTION_RESULT_JSON_SCHEMA with a generated schema
by using zod-to-json-schema (or equivalent) to convert ExtractionResultSchema
into JSON Schema at build/runtime, update imports so ExtractionResultSchema is
the sole source of truth, and export the generated schema where
EXTRACTION_RESULT_JSON_SCHEMA was used so consumers (LLM structured output
validation) always get the schema derived from the Zod definition.

In `@apps/website/app/utils/llm/providers.ts`:
- Around line 20-29: The OpenAI API requires snake_case keys `response_format`
and `json_schema` inside the object created when `settings.outputSchema` is
present; add inline eslint-disable comments (e.g., // eslint-disable-next-line
`@typescript-eslint/naming-convention`) immediately above or inline with those
properties to suppress naming-convention errors and include a short comment
noting “required by OpenAI API” to document why the rule is disabled; update the
object around `settings.outputSchema`, `response_format`, `json_schema`, and the
`name: "extraction_result"` entry accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0eb45979-82e8-48e9-a803-029f780a6309

📥 Commits

Reviewing files that changed from the base of the PR and between 700a7ab and f7c7871.

📒 Files selected for processing (6)
  • apps/website/app/api/ai/extract/route.ts
  • apps/website/app/prompts/extraction.ts
  • apps/website/app/types/extraction.ts
  • apps/website/app/types/llm.ts
  • apps/website/app/utils/ai/parseExtractionResponse.ts
  • apps/website/app/utils/llm/providers.ts

Gemini parts use { text } not { type: "text", text }. The shared
textBlock was using the Anthropic/OpenAI format which Gemini rejects.
@sid597 sid597 changed the title fENG-1602: Add PDF extraction API route ENG-1602: Add PDF extraction API route Apr 3, 2026
@sid597 sid597 requested a review from mdroidian April 3, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants