bmdhodl · bmdhodl · Apr 16, 2026 · Apr 11, 2026 · Apr 11, 2026 · Apr 16, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -125,6 +125,7 @@ __init__.py (public API surface)
     ├── setup.py ──→ tracing.py, guards.py, instrument.py, sinks/http.py
     ├── tracing.py (standalone)
     ├── guards.py (standalone)
+    ├── escalation.py ──→ guards.py
     ├── instrument.py ──→ usage.py, guards.py, cost.py
     ├── atracing.py ──→ tracing.py
     ├── cost.py (standalone)
@@ -147,6 +148,7 @@ Integration modules (allowed to import core, never the reverse):
 |--------|---------|
 | `tracing.py` | Tracer, TraceSink, TraceContext, JsonlFileSink, StdoutSink |
 | `guards.py` | LoopGuard, FuzzyLoopGuard, BudgetGuard, TimeoutGuard, RateLimitGuard, RetryGuard + exceptions |
+| `escalation.py` | BudgetAwareEscalation, EscalationSignal, EscalationRequired |
 | `instrument.py` | @trace_agent, @trace_tool, patch_openai, patch_anthropic |
 | `usage.py` | Provider inference and normalized token-usage helpers shared across runtime/reporting paths |
 | `sinks/http.py` | HttpSink (batched, gzip, retry, SSRF protection) |
@@ -280,8 +282,9 @@ memory/                       SDK state, blockers, decisions, distribution prior
 ### Public API Surface
 
 **Tracing:** `Tracer`, `TraceContext`, `TraceSink`, `JsonlFileSink`, `StdoutSink`, `HttpSink`
-**Guards:** `LoopGuard`, `FuzzyLoopGuard`, `BudgetGuard`, `TimeoutGuard`, `RateLimitGuard`, `RetryGuard`
-**Exceptions:** `LoopDetected`, `BudgetExceeded`, `BudgetWarning`, `TimeoutExceeded`, `RetryLimitExceeded`
+**Guards:** `LoopGuard`, `FuzzyLoopGuard`, `BudgetGuard`, `TimeoutGuard`, `RateLimitGuard`, `RetryGuard`, `BudgetAwareEscalation`
+**Guard helpers:** `EscalationSignal`
+**Exceptions:** `LoopDetected`, `BudgetExceeded`, `BudgetWarning`, `TimeoutExceeded`, `RetryLimitExceeded`, `EscalationRequired`
 **Instrumentation:** `trace_agent`, `trace_tool`, `patch_openai`, `patch_anthropic`, `unpatch_openai`, `unpatch_anthropic`
 **Async:** `AsyncTracer`, `AsyncTraceContext`, `async_trace_agent`, `async_trace_tool`, `patch_openai_async`, `patch_anthropic_async`, `unpatch_openai_async`, `unpatch_anthropic_async`
 **Cost:** `estimate_cost`

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,6 +10,10 @@
 - Added a new local `examples/per_token_budget_spike.py` proof that prices turns from token counts and shows `BudgetGuard` catching a single oversized turn without any API key or network access.
 - Updated README, getting-started docs, and examples docs to frame budget enforcement around token-metered pricing and point users to the new local proof path.
 
+### Budget-Aware Escalation Guard
+- Added `BudgetAwareEscalation`, `EscalationSignal`, and `EscalationRequired` so developers can keep a cheaper default model and escalate only hard turns to a stronger model without adding provider-specific SDK dependencies.
+- Added support for token-count, confidence, tool-call-depth, and custom-rule escalation triggers, plus a local example and guide for the Llama-to-Claude advisor-style pattern.
+
 ### Managed-Agent Session Correlation
 - Added optional `session_id` support to `Tracer`, `AsyncTracer`, and `agentguard.init(...)` so disposable harnesses can correlate multiple trace streams under one higher-level managed-agent session without changing sink behavior.
 - Added a local managed-session guide plus a runnable example that proves two separate tracer instances can emit distinct `trace_id` values while sharing one `session_id`.

diff --git a/MORNING_REPORT.md b/MORNING_REPORT.md
@@ -1,24 +1,26 @@
 # Morning Report
 
 ## Mission
-Clarify where AgentGuard sits in the emerging agent security stack so the SDK is easier to position against identity, MCP-governance, and sandbox products without over-claiming.
+Ship a portable advisor-style escalation guard in the public SDK without breaking the zero-dependency, local-first boundary.
 
 ## What shipped
-- added `docs/competitive/agent-security-stack.md`, a living positioning doc that places AgentGuard in the runtime behavior and budget layer
-- updated `README.md` so the competitive-doc area links to both the Vercel gateway comparison and the broader security-stack framing
-- regenerated `sdk/PYPI_README.md`
-- added an unreleased changelog entry and proof bundle under `proof/agent-security-stack-positioning/`
+- added `BudgetAwareEscalation`, `EscalationSignal`, and `EscalationRequired`
+- supported four v1 escalation triggers: token count, confidence, tool-call depth, and a custom rule
+- split the feature into a new stdlib-only `sdk/agentguard/escalation.py` core module so `guards.py` stayed under the repo's line-limit rule
+- added a local Llama-to-Claude worked example plus a guide page
+- updated README, examples docs, roadmap, changelog, and generated PyPI README
 
 ## Why it matters
-- this gives the SDK a cleaner answer to "how do you relate to identity, MCP governance, or sandboxing vendors?"
-- it keeps the repo inside its actual architecture boundary: in-process runtime enforcement, not credential brokering or control-plane governance
-- the README now points to that positioning directly, which is better for distribution than leaving the layer story implicit
+- the SDK had hard stop guards, but no portable way to keep a cheap model on by default and escalate only the hard turns
+- this gives users an advisor-style pattern without locking the repo to Anthropic's server-side tool or any hosted control plane
+- the feature stays local-first: the guard decides, the app routes
 
 ## Validation
-- `python scripts/sdk_preflight.py` passed
-- `python -m pytest sdk/tests/test_pypi_readme_sync.py -v` passed
-- `python scripts/sdk_release_guard.py` passed
+- full SDK suite passed: `687 passed`
+- coverage passed at `92.75%`
+- lint, structural checks, release guard, preflight, and bandit passed
+- local example run produced both console proof and a trace artifact under `proof/budget-aware-escalation/`
 
 ## Notes
-- I intentionally kept this docs-only because the queue item is a positioning correction, not a runtime feature request
-- I used the repo's actual architecture as the constraint: MCP is a read path, the SDK is the runtime layer, and the private dashboard remains out of scope
+- I did not repeat the queue note's exact "more than doubled benchmark score" line in repo docs because I could not verify that exact wording from the primary Anthropic doc available in this environment
+- the feature is intentionally explicit rather than "magic routing" inside provider patchers
diff --git a/PR_DRAFT.md b/PR_DRAFT.md
@@ -1,35 +1,39 @@
 # PR Draft
 
 ## Title
-Position AgentGuard inside the emerging agent security stack
+Add a budget-aware escalation guard for advisor-style model routing
 
 ## Summary
-- add a new competitive doc that places AgentGuard in the runtime behavior and budget layer, distinct from identity, MCP governance, and sandboxing
-- update the README's competitive-doc links so the public repo points to both the gateway comparison and the broader stack framing
-- regenerate the PyPI README so package docs stay aligned with the repo README
+- add `BudgetAwareEscalation`, `EscalationSignal`, and `EscalationRequired` so apps can keep a cheaper default model and escalate only hard turns to a stronger model
+- support four v1 signals: token count, confidence, tool-call depth, and a custom rule
+- keep the SDK boundary intact: AgentGuard decides when to escalate; the app still owns the actual provider call
 
 ## Scope
-- `docs/competitive/agent-security-stack.md`
-- `README.md`
-- `CHANGELOG.md`
-- `sdk/PYPI_README.md`
-- `PR_DRAFT.md`
-- `MORNING_REPORT.md`
-- proof artifacts under `proof/agent-security-stack-positioning/`
+- core guard implementation in `sdk/agentguard/escalation.py`
+- public exports and guard-module compatibility re-exports
+- tests for signal matching, next-call arming, exports, DX, smoke, and example execution
+- one guide and one local-only example
+- README / examples / changelog / roadmap / generated PyPI README sync
+- proof artifacts under `proof/budget-aware-escalation/`
 
 ## Non-goals
 - no dashboard work
-- no SDK runtime or MCP code changes
-- no attempt to turn AgentGuard into the identity, governance, or sandbox layer
-- no speculative vendor feature claims beyond clearly labeled layer framing
+- no provider-specific routing adapter
+- no hidden network behavior
+- no new runtime dependencies
+- no attempt to auto-switch OpenAI or Anthropic patchers under the hood
 
 ## Proof
-- `python scripts/sdk_preflight.py`
-- `python -m pytest sdk/tests/test_pypi_readme_sync.py -v`
+- `python -m ruff check sdk/agentguard/guards.py sdk/agentguard/escalation.py sdk/agentguard/__init__.py sdk/tests/test_guards.py sdk/tests/test_exports.py sdk/tests/test_dx.py sdk/tests/test_smoke.py sdk/tests/test_example_starters.py sdk/tests/test_architecture.py examples/budget_aware_escalation.py scripts/generate_pypi_readme.py`
+- `python -m pytest sdk/tests -v --cov=agentguard --cov-report=term-missing --cov-fail-under=80`
 - `python scripts/sdk_release_guard.py`
+- `python scripts/sdk_preflight.py`
+- `python -m bandit -r sdk/agentguard -s B101,B110,B112,B311 -q`
+- `python scripts/generate_pypi_readme.py --write`
+- `PYTHONPATH=sdk python examples/budget_aware_escalation.py`
 
 ## Saved artifacts
-- `proof/agent-security-stack-positioning/preflight.txt`
-- `proof/agent-security-stack-positioning/pypi-sync.txt`
-- `proof/agent-security-stack-positioning/release-guard.txt`
-- `proof/agent-security-stack-positioning/git-diff.txt`
+- `proof/budget-aware-escalation/example-output.txt`
+- `proof/budget-aware-escalation/budget_aware_escalation_traces.jsonl`
+- `proof/budget-aware-escalation/source-notes.md`
+- `proof/budget-aware-escalation/blog-draft.md`
diff --git a/README.md b/README.md
@@ -315,6 +315,7 @@ Guards are runtime checks that raise exceptions when limits are hit. The agent s
 | `TimeoutGuard` | Wall-clock time limits | `TimeoutGuard(max_seconds=300)` |
 | `RateLimitGuard` | Calls-per-minute throttling | `RateLimitGuard(max_calls_per_minute=60)` |
 | `RetryGuard` | Retry storms on the same flaky tool | `RetryGuard(max_retries=3)` |
+| `BudgetAwareEscalation` | Hard turns that should switch to a stronger model | `BudgetAwareEscalation(..., escalate_on=EscalationSignal.TOKEN_COUNT(threshold=2000))` |
 
 ```python
 from agentguard import BudgetGuard, BudgetExceeded
@@ -348,6 +349,29 @@ with tracer.trace("agent.run") as span:
         pass
 ```
 
+```python
+from agentguard import BudgetAwareEscalation, EscalationSignal
+
+guard = BudgetAwareEscalation(
+    primary_model="ollama/llama3.1:8b",
+    escalate_model="claude-opus-4-6",
+    escalate_on=(
+        EscalationSignal.TOKEN_COUNT(threshold=2000),
+        EscalationSignal.CONFIDENCE_BELOW(threshold=0.45),
+    ),
+)
+
+model = guard.select_model(token_count=2430, confidence=0.39)
+```
+
+`BudgetAwareEscalation` gives you an advisor-style pattern without hiding the
+provider call inside the SDK. AgentGuard decides when the current turn is too
+hard for the cheap model; your app still chooses how to invoke the stronger
+model.
+
+Guide:
+[`docs/guards/budget-aware-escalation.md`](docs/guards/budget-aware-escalation.md)
+
 ## Integrations
 
 ### LangChain

diff --git a/docs/guards/budget-aware-escalation.md b/docs/guards/budget-aware-escalation.md
@@ -0,0 +1,164 @@
+# Budget-Aware Escalation
+
+`BudgetAwareEscalation` gives AgentGuard users a portable version of the "cheap executor, strong advisor" pattern without baking provider-specific routing into the SDK.
+
+The shape is simple:
+
+- keep a cheaper default model for routine work
+- escalate only when the current turn looks hard
+- wire the returned model choice into your own client call
+
+This stays consistent with AgentGuard's core boundary: the SDK decides when a run is risky or expensive; your app still owns the actual model invocation.
+
+## Why this exists
+
+Anthropic's official advisor-tool docs describe the same high-level pattern: pair a faster, lower-cost executor with a stronger advisor for hard long-horizon tasks, especially coding and agentic workflows. The queue item that triggered this work called out the durable pattern, not a vendor-specific API, and that is the right way to bring it into AgentGuard.
+
+Source:
+- [Anthropic advisor tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/advisor-tool)
+
+## What the guard does
+
+`BudgetAwareEscalation` supports four trigger types in v1:
+
+1. token count over a threshold
+2. normalized confidence below a threshold
+3. tool-call depth over a threshold
+4. a custom rule that inspects the turn context
+
+When a configured signal trips, the guard can:
+
+- raise `EscalationRequired` so you switch to the stronger model explicitly, or
+- arm the next call through the normal `auto_check(...)` path and let `select_model()` choose the stronger model on the next turn
+
+## Minimal example
+
+```python
+from agentguard import BudgetAwareEscalation, EscalationRequired, EscalationSignal
+
+guard = BudgetAwareEscalation(
+    primary_model="ollama/llama3.1:8b",
+    escalate_model="claude-opus-4-6",
+    escalate_on=(
+        EscalationSignal.TOKEN_COUNT(threshold=2000),
+        EscalationSignal.CONFIDENCE_BELOW(threshold=0.45),
+    ),
+)
+
+model = guard.select_model()
+
+# ... run the cheaper model first ...
+guard.auto_check(
+    "llm.result",
+    {
+        "model": model,
+        "usage": {"total_tokens": 2430},
+        "confidence": 0.39,
+    },
+)
+
+try:
+    guard.check()
+    next_model = guard.primary_model
+except EscalationRequired as exc:
+    next_model = exc.target_model
+    print(exc.reason)
+```
+
+## Worked example: local Llama to Claude
+
+This repo includes a local-only example that simulates a local Llama turn, arms the escalation guard from the result, and then routes the next call to Claude:
+
+```bash
+PYTHONPATH=sdk python examples/budget_aware_escalation.py
+```
+
+Expected output:
+
+```text
+Turn 1 model: ollama/llama3.1:8b
+Turn 2 model: claude-opus-4-6
+Escalation reason: token_count 2430 exceeded 2000
+Wrote budget_aware_escalation_traces.jsonl
+```
+
+The example also writes a local trace file so the escalation path is inspectable.
+
+## Signal semantics
+
+### Token count
+
+Use when a turn is getting too expensive or too context-heavy for the cheaper model:
+
+```python
+EscalationSignal.TOKEN_COUNT(threshold=2000)
+```
+
+The guard looks for:
+
+- `token_count`
+- `total_tokens`
+- `usage.total_tokens`
+
+### Confidence below
+
+Use when your runtime already exposes a normalized confidence score:
+
+```python
+EscalationSignal.CONFIDENCE_BELOW(threshold=0.45)
+```
+
+The SDK does not invent confidence for you. If your provider exposes logprobs or another score, normalize it in your app and pass `confidence=...` or put `confidence` in the traced event payload.
+
+### Tool-call depth
+
+Use when a turn is spiraling into too many tool hops:
+
+```python
+EscalationSignal.TOOL_CALL_DEPTH(threshold=3)
+```
+
+The guard looks for:
+
+- `tool_call_depth`
+- `depth`
+- `tool_calls` list length
+
+### Custom rule
+
+Use when your own heuristics are stronger than any single built-in signal:
+
+```python
+EscalationSignal.CUSTOM(
+    lambda ctx: (ctx.get("token_count") or 0) > 1500 and (ctx.get("confidence") or 1.0) < 0.5,
+    name="hard_turn",
+)
+```
+
+The custom rule receives a context dict with:
+
+- `event_name`
+- `event_data`
+- `primary_model`
+- `escalate_model`
+- `current_model`
+- `token_count`
+- `confidence`
+- `tool_call_depth`
+
+## Design boundary
+
+This is intentionally not "transparent provider switching" inside AgentGuard's patchers.
+
+Why:
+
+- AgentGuard guards are runtime enforcement primitives, not provider adapters
+- the SDK must stay zero-dependency
+- routing policy is portable only if your app remains in control of the actual client call
+
+So the contract is:
+
+- AgentGuard tells you when escalation is needed
+- your application decides how to invoke the stronger model
+
+That keeps the feature compatible with local Ollama, raw HTTP clients, OpenAI-compatible endpoints, Anthropic, and future provider adapters without locking the SDK to one stack.
diff --git a/examples/README.md b/examples/README.md
@@ -21,6 +21,7 @@ coding-agent setup; they are not included in the installed PyPI wheel.
 
 | File | Framework | What it shows |
 |------|-----------|---------------|
+| `budget_aware_escalation.py` | Raw AgentGuard | Advisor-style escalation from a cheaper local model to a stronger model on hard turns |
 | `disposable_harness_session.py` | Raw AgentGuard | Two tracer instances sharing one `session_id` to simulate a managed-agent session across disposable harnesses |
 | `per_token_budget_spike.py` | Raw AgentGuard | Local token-metered pricing proof: one oversized turn triggers `BudgetGuard` without any API key |
 | **`cost_guardrail.py`** | **OpenAI** | **Full cost guardrail pipeline: auto-budget enforcement, warning/exceeded events, dashboard sync** |
@@ -55,6 +56,9 @@ python examples/disposable_harness_session.py
 # Per-token budget spike demo (local-only)
 python examples/per_token_budget_spike.py
 
+# Advisor-style escalation demo (local-only)
+python examples/budget_aware_escalation.py
+
 # Or with dashboard integration
 export AGENTGUARD_API_KEY=ag_...
 python examples/cost_guardrail.py