From 352acad4ac56b8e5e4a874e53919733500d99dbd Mon Sep 17 00:00:00 2001 From: rohan-tessl Date: Wed, 6 May 2026 10:52:25 +0530 Subject: [PATCH] feat: improve 5 lowest-scoring skill definitions --- .claude/skills/agent-openai-memory/SKILL.md | 125 ++---- .claude/skills/long-running-server/SKILL.md | 286 ++----------- .../.claude/skills/deploy/SKILL.md | 397 ++---------------- 3 files changed, 115 insertions(+), 693 deletions(-) diff --git a/.claude/skills/agent-openai-memory/SKILL.md b/.claude/skills/agent-openai-memory/SKILL.md index 12f5d0b1..0258fede 100644 --- a/.claude/skills/agent-openai-memory/SKILL.md +++ b/.claude/skills/agent-openai-memory/SKILL.md @@ -1,32 +1,24 @@ --- name: agent-openai-memory -description: "Add memory capabilities to your agent. Use when: (1) User asks about 'memory', 'state', 'remember', 'conversation history', (2) Want to persist conversations or user preferences, (3) Adding checkpointing or long-term storage." +description: "Add session-based memory to OpenAI Agents SDK agent using AsyncDatabricksSession and Lakebase. Use when: (1) User asks about 'memory', 'state', 'remember', 'conversation history', (2) Want to persist conversations or user preferences, (3) Adding session-based checkpointing." --- # Stateful Memory with OpenAI Agents SDK Sessions -This template uses OpenAI Agents SDK [Sessions](https://openai.github.io/openai-agents-python/sessions/) with `AsyncDatabricksSession` to persist conversation history to a Databricks Lakebase instance. +Uses `AsyncDatabricksSession` to persist conversation history to Lakebase, enabling multi-turn interactions where the agent remembers prior messages within a session. -## How Sessions Work - -Sessions automatically manage conversation history for multi-turn interactions: - -1. **Before each run**: The session retrieves prior conversation history and prepends it to input -2. **During the run**: New items (user messages, responses, tool calls) are generated -3. **After each run**: All new items are automatically stored in the session - -This eliminates the need to manually manage conversation state between runs. +## Prerequisites -## Key Concepts +1. **Dependency**: `databricks-openai[memory]` in `pyproject.toml` (already included in memory templates) +2. **Lakebase instance**: See **lakebase-setup** skill for creating and configuring one +3. **Environment variable**: Set `LAKEBASE_INSTANCE_NAME` in `.env`: + ```bash + LAKEBASE_INSTANCE_NAME= + ``` -| Concept | Description | -|---------|-------------| -| **Session** | Stores conversation history for a specific `session_id` | -| **`session_id`** | Unique identifier linking requests to the same conversation | -| **`AsyncDatabricksSession`** | Session implementation backed by Databricks Lakebase | -| **`LAKEBASE_INSTANCE_NAME`** | Environment variable specifying the Lakebase instance | +--- -## How This Template Uses Sessions +## Implementation ### Session Creation (`agent_server/agent.py`) @@ -43,8 +35,6 @@ result = await Runner.run(agent, messages, session=session) ### Session ID Extraction (`agent_server/agent.py`) -The `session_id` is extracted from `custom_inputs` or auto-generated: - ```python def get_session_id(request: ResponsesAgentRequest) -> str: if hasattr(request, "custom_inputs") and request.custom_inputs: @@ -55,8 +45,6 @@ def get_session_id(request: ResponsesAgentRequest) -> str: ### Lakebase Instance Resolution (`agent_server/utils.py`) -The `LAKEBASE_INSTANCE_NAME` env var can be either an instance name or a hostname. The `resolve_lakebase_instance_name()` function handles both cases: - ```python _LAKEBASE_INSTANCE_NAME_RAW = os.environ.get("LAKEBASE_INSTANCE_NAME") LAKEBASE_INSTANCE_NAME = resolve_lakebase_instance_name(_LAKEBASE_INSTANCE_NAME_RAW) @@ -64,78 +52,50 @@ LAKEBASE_INSTANCE_NAME = resolve_lakebase_instance_name(_LAKEBASE_INSTANCE_NAME_ --- -## Prerequisites - -1. **Dependency**: `databricks-openai[memory]` must be in `pyproject.toml` (already included) - -2. **Lakebase instance**: You need a Databricks Lakebase instance. See the **lakebase-setup** skill for creating and configuring one. - -3. **Environment variable**: Set `LAKEBASE_INSTANCE_NAME` in your `.env` file: - ```bash - LAKEBASE_INSTANCE_NAME= - ``` - ---- - -## Configuration Files +## Configuration ### databricks.yml (Lakebase Resource) -Add the Lakebase database resource to your app: - ```yaml resources: - apps: - agent_openai_advanced: - name: "your-app-name" - source_code_path: ./ - - resources: - # ... other resources (experiment, etc.) ... - - # Lakebase instance for session storage - - name: 'database' - database: - instance_name: '' - database_name: 'databricks_postgres' - permission: 'CAN_CONNECT_AND_CREATE' + - name: 'database' + database: + instance_name: '' + database_name: 'databricks_postgres' + permission: 'CAN_CONNECT_AND_CREATE' ``` -### databricks.yml config block (Environment Variables) - -The `LAKEBASE_INSTANCE_NAME` env var is resolved from the database resource at deploy time. Add to your app's `config.env` in `databricks.yml`: - ```yaml - config: - env: - - name: LAKEBASE_INSTANCE_NAME - value_from: "database" +config: + env: + - name: LAKEBASE_INSTANCE_NAME + value_from: "database" ``` -### .env (Local Development) +--- -```bash -LAKEBASE_INSTANCE_NAME= -``` +## Testing ---- +### Verify Lakebase Connectivity -## Testing Sessions +```bash +databricks lakebase instances get --profile +``` -### Test Multi-Turn Conversation Locally +### Test Multi-Turn Conversation ```bash # Start the server uv run start-app -# First message - starts a new session +# First message -- starts a new session curl -X POST http://localhost:8000/invocations \ -H "Content-Type: application/json" \ -d '{"input": [{"role": "user", "content": "Hello, I live in SF!"}]}' # Note the session_id from custom_outputs in the response -# Second message - continues the same session +# Second message -- continues the same session (should remember SF) curl -X POST http://localhost:8000/invocations \ -H "Content-Type: application/json" \ -d '{ @@ -144,28 +104,19 @@ curl -X POST http://localhost:8000/invocations \ }' ``` -### Test Streaming - -```bash -curl -X POST http://localhost:8000/invocations \ - -H "Content-Type: application/json" \ - -d '{ - "input": [{"role": "user", "content": "Hello!"}], - "stream": true - }' -``` +If the agent responds with "SF" or "San Francisco", session memory is working. --- ## Troubleshooting -| Issue | Cause | Solution | -|-------|-------|----------| -| **"LAKEBASE_INSTANCE_NAME environment variable is required"** | Missing env var | Set `LAKEBASE_INSTANCE_NAME` in `.env` | -| **SSL connection closed unexpectedly** | Network/instance issue | Verify Lakebase instance is running: `databricks lakebase instances get ` | -| **Agent doesn't remember previous messages** | Different session_id | Pass the same `session_id` via `custom_inputs` across requests | -| **"Unable to resolve hostname"** | Hostname doesn't match any instance | Verify the hostname or use the instance name directly | -| **Permission denied** | Missing Lakebase access | Add `database` resource to `databricks.yml` with `CAN_CONNECT_AND_CREATE` | +| Issue | Solution | +|-------|----------| +| "LAKEBASE_INSTANCE_NAME environment variable is required" | Set `LAKEBASE_INSTANCE_NAME` in `.env` | +| SSL connection closed unexpectedly | Verify instance is running: `databricks lakebase instances get ` | +| Agent doesn't remember previous messages | Pass same `session_id` via `custom_inputs` across requests | +| "Unable to resolve hostname" | Use instance name directly instead of hostname | +| Permission denied | Add `database` resource to `databricks.yml` with `CAN_CONNECT_AND_CREATE` | --- diff --git a/.claude/skills/long-running-server/SKILL.md b/.claude/skills/long-running-server/SKILL.md index 694e7631..d9ef5608 100644 --- a/.claude/skills/long-running-server/SKILL.md +++ b/.claude/skills/long-running-server/SKILL.md @@ -1,79 +1,45 @@ --- name: long-running-server -description: "Enable long-running background task support with LongRunningAgentServer. Use when: (1) Agent tasks may exceed HTTP timeout (~120s), (2) User wants background/async execution, (3) User says 'long running', 'background tasks', or 'async agent'." +description: "Upgrade to LongRunningAgentServer for background task execution surviving HTTP timeouts. Configures task queuing, status polling, and stream resumption. Use when: (1) Agent tasks may exceed HTTP timeout (~120s), (2) User wants background/async execution, (3) User says 'long running', 'background tasks', or 'async agent'." --- # Enable Long-Running Agent Server -> **Prerequisite:** Lakebase must be configured. If not already set up, follow the **lakebase-setup** skill first. +> **Prerequisite:** Lakebase must be configured. Follow the **lakebase-setup** skill first if not done. -Upgrades from `AgentServer` to `LongRunningAgentServer`, enabling background task execution that survives HTTP timeouts. Long-running tasks are persisted to Lakebase PostgreSQL so clients can poll or stream results. - -## What It Enables +Upgrades from `AgentServer` to `LongRunningAgentServer`, enabling background task execution persisted to Lakebase PostgreSQL. | Request pattern | Description | |---|---| -| **Standard** | `POST /responses` — blocks until complete (queries ≤ 120s) | -| **Background + Poll** | `POST /responses { background: true }` → `GET /responses/{id}` | +| **Standard** | `POST /responses` -- blocks until complete (queries <= 120s) | +| **Background + Poll** | `POST /responses { background: true }` then `GET /responses/{id}` | | **Background + Stream** | `POST /responses { background: true, stream: true }` with cursor-based resumption via `starting_after` | --- ## Step 1: Add Dependency -Add `databricks-ai-bridge[agent-server]` to `pyproject.toml`: - ```toml -dependencies = [ - # ... existing dependencies ... - "databricks-ai-bridge[agent-server]>=0.18.0", -] +dependencies = ["databricks-ai-bridge[agent-server]>=0.18.0"] ``` -Run `uv sync` to install. +Verify: `uv sync && python -c "from databricks_ai_bridge.long_running import LongRunningAgentServer"` --- ## Step 2: Update `start_server.py` -Replace the basic `AgentServer` with `LongRunningAgentServer`. Key changes: - -1. Import `LongRunningAgentServer` instead of `AgentServer` -2. Subclass it to override `transform_stream_event` (replaces placeholder IDs in streamed events) -3. Pass Lakebase connection config and timeout settings -4. Add a lifespan hook to initialize database tables at startup - -### OpenAI SDK +Replace `AgentServer` with `LongRunningAgentServer`. Key changes from the base `start_server.py`: ```python -"""Agent server entry point. load_dotenv must run before agent imports (auth config).""" - -# ruff: noqa: E402 -import os -from contextlib import asynccontextmanager -from pathlib import Path - -from dotenv import load_dotenv - -load_dotenv(dotenv_path=Path(__file__).parent.parent / ".env", override=True) - -import logging - from databricks_ai_bridge.long_running import LongRunningAgentServer -from mlflow.genai.agent_server import setup_mlflow_git_based_version_tracking - from agent_server.utils import lakebase_config, replace_fake_id - -import agent_server.agent # noqa: F401 - -logger = logging.getLogger(__name__) - +# LangGraph uses: from agent_server.utils import LAKEBASE_CONFIG as lakebase_config, replace_fake_id class AgentServer(LongRunningAgentServer): def transform_stream_event(self, event, response_id): return replace_fake_id(event, response_id) - agent_server = AgentServer( "ResponsesAgent", enable_chat_proxy=True, @@ -84,82 +50,15 @@ agent_server = AgentServer( task_timeout_seconds=float(os.getenv("TASK_TIMEOUT_SECONDS", "3600")), poll_interval_seconds=float(os.getenv("POLL_INTERVAL_SECONDS", "1.0")), ) - -log_level = os.getenv("LOG_LEVEL", "INFO") -logging.getLogger("agent_server").setLevel(getattr(logging, log_level.upper(), logging.INFO)) - -_original_lifespan = agent_server.app.router.lifespan_context - - -@asynccontextmanager -async def _lifespan(app): - # Initialize session/long-running tables at startup. - # If using AsyncDatabricksSession, create a throwaway session and call _ensure_tables(). - async with _original_lifespan(app): - yield - - -agent_server.app.router.lifespan_context = _lifespan - -app = agent_server.app # noqa: F841 -setup_mlflow_git_based_version_tracking() - - -def main(): - agent_server.run(app_import_string="agent_server.start_server:app") ``` -### LangGraph +Keep the existing `load_dotenv`, `setup_mlflow_git_based_version_tracking()`, and `main()` boilerplate. Add a lifespan hook to initialize Lakebase tables at startup: ```python -"""Agent server entry point. load_dotenv must run before agent imports (auth config).""" - -# ruff: noqa: E402 -import os -from contextlib import asynccontextmanager -from pathlib import Path - -from dotenv import load_dotenv - -load_dotenv(dotenv_path=Path(__file__).parent.parent / ".env", override=True) - -import logging - -from databricks_ai_bridge.long_running import LongRunningAgentServer -from mlflow.genai.agent_server import setup_mlflow_git_based_version_tracking - -from agent_server.utils import replace_fake_id, LAKEBASE_CONFIG - -import agent_server.agent # noqa: F401 - -logger = logging.getLogger(__name__) - - -class AgentServer(LongRunningAgentServer): - def transform_stream_event(self, event, response_id): - return replace_fake_id(event, response_id) - - -agent_server = AgentServer( - "ResponsesAgent", - enable_chat_proxy=True, - db_instance_name=LAKEBASE_CONFIG.instance_name, - db_autoscaling_endpoint=LAKEBASE_CONFIG.autoscaling_endpoint, - db_project=LAKEBASE_CONFIG.autoscaling_project, - db_branch=LAKEBASE_CONFIG.autoscaling_branch, - task_timeout_seconds=float(os.getenv("TASK_TIMEOUT_SECONDS", "3600")), - poll_interval_seconds=float(os.getenv("POLL_INTERVAL_SECONDS", "1.0")), -) - -app = agent_server.app # noqa: F841 -setup_mlflow_git_based_version_tracking() - _original_lifespan = app.router.lifespan_context - @asynccontextmanager async def _lifespan(app): - # Initialize Lakebase tables at startup (e.g. run_lakebase_setup) try: async with _original_lifespan(app): yield @@ -167,177 +66,80 @@ async def _lifespan(app): logger.warning("Long-running DB init failed: %s. Background mode disabled.", exc) yield - app.router.lifespan_context = _lifespan - - -def main(): - agent_server.run(app_import_string="agent_server.start_server:app") ``` +Verify: `uv run start-app` -- server should start without import errors. + --- ## Step 3: Add `replace_fake_id` Utility -Add to `utils.py` if not already present. The implementation differs by SDK: - -### OpenAI SDK +Add to `utils.py`. Only the match condition differs by SDK: ```python +# OpenAI SDK: match exact constant try: from agents.models.fake_id import FAKE_RESPONSES_ID except ImportError: FAKE_RESPONSES_ID = "__fake_id__" +_match = lambda s: s == FAKE_RESPONSES_ID +# LangGraph: match prefix +# _match = lambda s: s.startswith("resp_placeholder_") def replace_fake_id(obj, real_id: str): - """Recursively replace FAKE_RESPONSES_ID with real_id.""" if isinstance(obj, dict): return {k: replace_fake_id(v, real_id) for k, v in obj.items()} elif isinstance(obj, list): return [replace_fake_id(item, real_id) for item in obj] - elif isinstance(obj, str) and obj == FAKE_RESPONSES_ID: + elif isinstance(obj, str) and _match(obj): return real_id return obj ``` -### LangGraph - -```python -_FAKE_ID_PREFIX = "resp_placeholder_" - - -def replace_fake_id(obj, real_id: str): - """Recursively replace any resp_placeholder_* ID with real_id.""" - if isinstance(obj, dict): - return {k: replace_fake_id(v, real_id) for k, v in obj.items()} - elif isinstance(obj, list): - return [replace_fake_id(item, real_id) for item in obj] - elif isinstance(obj, str) and obj.startswith(_FAKE_ID_PREFIX): - return real_id - return obj -``` - ---- - -## Step 4: Add Lakebase Config - -Add to `utils.py` if not already present. This reads Lakebase connection parameters from environment variables: - -```python -import os -from dataclasses import dataclass -from typing import Optional - - -@dataclass(frozen=True) -class LakebaseConfig: - instance_name: Optional[str] - autoscaling_endpoint: Optional[str] - autoscaling_project: Optional[str] - autoscaling_branch: Optional[str] - - -def init_lakebase_config() -> LakebaseConfig: - """Read lakebase env vars. Priority: endpoint > project+branch > instance_name.""" - endpoint = os.getenv("LAKEBASE_AUTOSCALING_ENDPOINT") or None - raw_name = os.getenv("LAKEBASE_INSTANCE_NAME") or None - project = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None - branch = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None - - has_autoscaling = project and branch - if not endpoint and not raw_name and not has_autoscaling: - raise ValueError( - "Lakebase configuration is required. Set one of:\n" - " LAKEBASE_AUTOSCALING_ENDPOINT=\n" - " LAKEBASE_AUTOSCALING_PROJECT + LAKEBASE_AUTOSCALING_BRANCH\n" - " LAKEBASE_INSTANCE_NAME=\n" - ) - - if endpoint: - return LakebaseConfig(instance_name=None, autoscaling_endpoint=endpoint, - autoscaling_project=None, autoscaling_branch=None) - elif has_autoscaling: - return LakebaseConfig(instance_name=None, autoscaling_endpoint=None, - autoscaling_project=project, autoscaling_branch=branch) - else: - return LakebaseConfig(instance_name=raw_name, autoscaling_endpoint=None, - autoscaling_project=None, autoscaling_branch=None) - - -# Module-level singleton -lakebase_config = init_lakebase_config() -``` - --- -## Step 5: Configure `databricks.yml` +## Step 4: Configure Environment -Add Lakebase resource and env vars per the **lakebase-setup** skill. The long-running server additionally uses these optional env vars: +Add to `databricks.yml` config env (Lakebase vars per **lakebase-setup** skill, plus): ```yaml -config: - env: - # ... existing env vars ... - - name: TASK_TIMEOUT_SECONDS - value: "3600" - - name: POLL_INTERVAL_SECONDS - value: "1.0" - - name: LOG_LEVEL - value: "INFO" +- name: TASK_TIMEOUT_SECONDS + value: "3600" +- name: POLL_INTERVAL_SECONDS + value: "1.0" ``` ---- - -## Step 6: Configure `.env` for Local Development - -Add Lakebase connection vars (see **lakebase-setup** skill for all options): - -```bash -# Pick ONE mode: -# Option 1: Autoscaling endpoint -LAKEBASE_AUTOSCALING_ENDPOINT= -# Option 2: Autoscaling project/branch -LAKEBASE_AUTOSCALING_PROJECT= -LAKEBASE_AUTOSCALING_BRANCH= -# Option 3: Provisioned instance -LAKEBASE_INSTANCE_NAME= - -# Optional tuning -TASK_TIMEOUT_SECONDS=3600 -POLL_INTERVAL_SECONDS=1.0 -LOG_LEVEL=INFO -``` +Add to `.env`: `TASK_TIMEOUT_SECONDS=3600` and `POLL_INTERVAL_SECONDS=1.0` --- -## Step 7: Deploy and Grant Permissions +## Step 5: Deploy, Grant Permissions, and Verify -Follow the **lakebase-setup** skill Steps 5-7 to deploy, grant SP permissions, and run the app. +Follow **lakebase-setup** skill Steps 5-7 to deploy, grant SP permissions, and run the app. ---- +**Verify background mode:** +```bash +# Submit a background task +curl -X POST http://localhost:8000/invocations \ + -H "Content-Type: application/json" \ + -d '{"input": [{"role": "user", "content": "Hello!"}], "background": true}' -## Constructor Reference +# Poll for result (use the response ID from above) +curl http://localhost:8000/responses/ +``` -| Parameter | Type | Default | Description | -|---|---|---|---| -| `name` | `str` | required | Server name (e.g. `"ResponsesAgent"`) | -| `enable_chat_proxy` | `bool` | `False` | Enable chat UI proxy endpoint | -| `db_instance_name` | `str \| None` | `None` | Provisioned Lakebase instance name | -| `db_autoscaling_endpoint` | `str \| None` | `None` | Autoscaling endpoint hostname | -| `db_project` | `str \| None` | `None` | Autoscaling project name | -| `db_branch` | `str \| None` | `None` | Autoscaling branch name | -| `task_timeout_seconds` | `float` | `3600` | Max background task time before timeout | -| `poll_interval_seconds` | `float` | `1.0` | Stream event poll interval | +If polling returns the agent's response, long-running mode is working. --- ## Troubleshooting -| Issue | Cause | Solution | -|---|---|---| -| `ImportError: cannot import LongRunningAgentServer` | Missing dependency | Add `databricks-ai-bridge[agent-server]>=0.18.0` and `uv sync` | -| `background=true` returns but no result | Lakebase not configured | Set Lakebase env vars in `.env` / `databricks.yml` | -| Task times out | Long agent execution | Increase `TASK_TIMEOUT_SECONDS` | -| Stream events have placeholder IDs | Missing `transform_stream_event` | Ensure `AgentServer` subclass overrides it | -| DB initialization failed warning | Lakebase connection error | Check env vars and permissions (see **lakebase-setup** skill) | +| Issue | Solution | +|---|---| +| `ImportError: cannot import LongRunningAgentServer` | Add `databricks-ai-bridge[agent-server]>=0.18.0` and `uv sync` | +| `background=true` returns but no result | Lakebase not configured -- set env vars in `.env` / `databricks.yml` | +| Task times out | Increase `TASK_TIMEOUT_SECONDS` | +| Stream events have placeholder IDs | Ensure `AgentServer` subclass overrides `transform_stream_event` | +| DB initialization failed warning | Check Lakebase env vars and permissions (see **lakebase-setup** skill) | diff --git a/agent-langchain-ts/.claude/skills/deploy/SKILL.md b/agent-langchain-ts/.claude/skills/deploy/SKILL.md index afbfb0e3..37c01068 100644 --- a/agent-langchain-ts/.claude/skills/deploy/SKILL.md +++ b/agent-langchain-ts/.claude/skills/deploy/SKILL.md @@ -1,61 +1,27 @@ --- name: deploy -description: "Deploy TypeScript LangChain agent to Databricks. Use when: (1) User wants to deploy, (2) User says 'deploy', 'push to databricks', 'production', (3) After making changes that need deployment." +description: "Build, validate, and deploy TypeScript LangChain agent to Databricks Apps. Use when: (1) User wants to deploy, (2) User says 'deploy', 'push to databricks', 'production', (3) After making changes that need deployment." --- # Deploy to Databricks -## Quick Deploy - -```bash -# Validate configuration -databricks bundle validate -t dev - -# Deploy to dev environment -databricks bundle deploy -t dev - -# Start the app -databricks bundle run agent_langchain_ts -``` - ## Deployment Targets -### Development (dev) -```bash -databricks bundle deploy -t dev -``` - -**Characteristics:** -- Default target -- User-scoped naming: `db-agent-langchain-ts-` -- Development mode permissions -- Auto-created resources +| Target | Command | Naming | Notes | +|--------|---------|--------|-------| +| **dev** (default) | `databricks bundle deploy -t dev` | `db-agent-langchain-ts-` | User-scoped, auto-created resources | +| **prod** | `databricks bundle deploy -t prod` | `db-agent-langchain-ts-prod` | Stricter permissions, fixed naming | -### Production (prod) -```bash -databricks bundle deploy -t prod -``` - -**Characteristics:** -- Production mode -- Stricter permissions -- Fixed naming: `db-agent-langchain-ts-prod` -- Requires explicit configuration +--- -## Step-by-Step Deployment +## Deploy Workflow -### 1. Prepare Code +### 1. Build and Test Locally -Ensure code is committed and tested: ```bash -# Test locally first -npm run dev - -# Run tests -npm test - -# Verify build works -npm run build +npm run dev # Test locally first +npm test # Run tests +npm run build # Verify build ``` ### 2. Validate Bundle @@ -64,370 +30,73 @@ npm run build databricks bundle validate -t dev ``` -This checks: -- `databricks.yml` syntax -- `app.yaml` configuration -- Resource references -- Variable interpolation +Checks `databricks.yml` syntax, `app.yaml` config, resource references, and variable interpolation. -### 3. Deploy Bundle +### 3. Deploy ```bash databricks bundle deploy -t dev ``` -This will: -- Create MLflow experiment if needed -- Upload source code -- Configure app environment -- Grant resource permissions -- Create app instance - ### 4. Start App ```bash databricks bundle run agent_langchain_ts ``` -Or manually: -```bash -databricks apps start db-agent-langchain-ts- -``` - -### 5. Verify Deployment +### 5. Verify ```bash -# Check app status +# Check status databricks apps get db-agent-langchain-ts- -# View logs +# Follow logs databricks apps logs db-agent-langchain-ts- --follow # Test health endpoint curl https:///apps/db-agent-langchain-ts-/health -``` - -## Managing Existing Apps - -### Bind Existing App - -If app already exists: -```bash -# Get app details -databricks apps get db-agent-langchain-ts- - -# Bind to bundle -databricks bundle deploy -t dev --force-bind -``` - -### Delete and Recreate - -```bash -# Delete existing app -databricks apps delete db-agent-langchain-ts- - -# Deploy fresh -databricks bundle deploy -t dev -``` - -## Configuration Files - -### databricks.yml - -Main bundle configuration: - -```yaml -bundle: - name: agent-langchain-ts - -variables: - serving_endpoint_name: - default: "databricks-claude-sonnet-4-5" - -resources: - experiments: - agent_experiment: - name: /Users/${workspace.current_user.userName}/agent-langchain-ts - - apps: - agent_langchain_ts: - name: db-agent-langchain-ts-${var.resource_name_suffix} - source_code_path: ./ - resources: - - name: serving-endpoint - serving_endpoint: - name: ${var.serving_endpoint_name} - permission: CAN_QUERY -``` - -### app.yaml - -Runtime configuration: - -```yaml -command: - - npm - - start - -env: - - name: DATABRICKS_MODEL - value: "databricks-claude-sonnet-4-5" - - name: MLFLOW_TRACKING_URI - value: "databricks" - - name: MLFLOW_EXPERIMENT_ID - value_from: "experiment" - -resources: - - name: serving-endpoint - serving_endpoint: - name: ${var.serving_endpoint_name} - permission: CAN_QUERY -``` - -## Viewing Deployed App - -### Get App URL - -```bash -databricks apps get db-agent-langchain-ts- --output json | jq -r .url -``` - -### Access App - -Navigate to: -``` -https:///apps/db-agent-langchain-ts- -``` - -### Test Deployed App - -```bash -# Health check -curl https:///apps/db-agent-langchain-ts-/health - -# Chat request +# Test chat curl -X POST https:///apps/db-agent-langchain-ts-/api/chat \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ - -d '{ - "messages": [ - {"role": "user", "content": "Hello!"} - ] - }' -``` - -## Monitoring - -### View Logs - -```bash -# Follow logs in real-time -databricks apps logs db-agent-langchain-ts- --follow - -# Get last 100 lines -databricks apps logs db-agent-langchain-ts- --tail 100 - -# Filter logs -databricks apps logs db-agent-langchain-ts- | grep ERROR -``` - -### View MLflow Traces - -See [MLflow Tracing Guide](../_shared/MLFLOW.md) for viewing traces in your workspace. - -### App Metrics - -```bash -# Get app details -databricks apps get db-agent-langchain-ts- --output json - -# Check app state -databricks apps get db-agent-langchain-ts- --output json | jq -r .state -``` - -## Updating Deployed App - -### Update Code - -```bash -# Make changes to code -# Then redeploy -databricks bundle deploy -t dev - -# Restart app -databricks apps restart db-agent-langchain-ts- + -d '{"messages": [{"role": "user", "content": "Hello!"}]}' ``` -### Update Configuration +--- -Edit `app.yaml` or `databricks.yml`, then: +## Update or Redeploy ```bash +# After code/config changes: databricks bundle deploy -t dev databricks apps restart db-agent-langchain-ts- ``` -## Adding Resources - -### Add Serving Endpoint Permission - -Edit `app.yaml`: - -```yaml -resources: - - name: serving-endpoint - serving_endpoint: - name: "your-endpoint-name" - permission: CAN_QUERY -``` - -Then redeploy: -```bash -databricks bundle deploy -t dev -``` - -### Add Unity Catalog Function - -Edit `databricks.yml`: - -```yaml -resources: - - name: uc-function - function: - name: "catalog.schema.function_name" - permission: EXECUTE -``` - -Update `app.yaml` to pass function config: - -```yaml -env: - - name: UC_FUNCTION_CATALOG - value: "catalog" - - name: UC_FUNCTION_SCHEMA - value: "schema" - - name: UC_FUNCTION_NAME - value: "function_name" -``` - -Redeploy: -```bash -databricks bundle deploy -t dev -``` - -## Troubleshooting - -### "App with same name already exists" +### Bind Existing App -Either bind existing app: ```bash databricks bundle deploy -t dev --force-bind ``` -Or delete and recreate: +### Delete and Recreate + ```bash databricks apps delete db-agent-langchain-ts- databricks bundle deploy -t dev ``` -### "Permission denied on serving endpoint" - -Ensure endpoint is listed in `app.yaml` resources: -```yaml -resources: - - name: serving-endpoint - serving_endpoint: - name: "databricks-claude-sonnet-4-5" - permission: CAN_QUERY -``` - -### "Experiment not found" - -Create experiment: -```bash -databricks experiments create \ - --experiment-name "/Users/$(databricks current-user me --output json | jq -r .userName)/agent-langchain-ts" -``` - -Or update `databricks.yml` to auto-create: -```yaml -resources: - experiments: - agent_experiment: - name: /Users/${workspace.current_user.userName}/agent-langchain-ts -``` - -### "App failed to start" - -Check logs: -```bash -databricks apps logs db-agent-langchain-ts- -``` - -Common issues: -- Missing dependencies in `package.json` -- Incorrect `npm start` command in `app.yaml` -- Missing environment variables -- Build errors - -### "Cannot reach app URL" - -Verify: -1. App is running: `databricks apps get | jq -r .state` -2. URL is correct: `databricks apps get | jq -r .url` -3. Authentication token is valid - -## CI/CD Integration - -### GitHub Actions Example - -```yaml -name: Deploy to Databricks - -on: - push: - branches: [main] - -jobs: - deploy: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v3 - - - name: Set up Node.js - uses: actions/setup-node@v3 - with: - node-version: '18' - - - name: Install dependencies - run: npm install - - - name: Run tests - run: npm test - - - name: Install Databricks CLI - run: | - curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh - - - name: Deploy to Databricks - env: - DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} - DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }} - run: | - databricks bundle deploy -t prod - databricks bundle run agent_langchain_ts -``` +--- -## Best Practices +## Troubleshooting -1. **Test Locally First**: Always test with `npm run dev` before deploying -2. **Use Dev Environment**: Test deployments in dev before prod -3. **Monitor Logs**: Check logs after deployment -4. **Version Control**: Commit changes before deploying -5. **Resource Permissions**: Verify all required resources are granted in `app.yaml` -6. **MLflow Traces**: Monitor traces to debug issues -7. **Incremental Updates**: Make small changes and test frequently +| Issue | Solution | +|-------|----------| +| "App with same name already exists" | `databricks bundle deploy -t dev --force-bind` or delete first | +| "Permission denied on serving endpoint" | Add endpoint to `app.yaml` resources with `permission: CAN_QUERY` | +| "Experiment not found" | Add experiment to `databricks.yml` resources (auto-creates on deploy) | +| "App failed to start" | Check logs: `databricks apps logs `. Common: missing deps, bad start command, missing env vars | +| "Cannot reach app URL" | Verify app state: `databricks apps get \| jq -r .state` and URL: `jq -r .url` | ## Related Skills