Blog: Controlling AI Spend w/ AppNet+agentgateway#5698
Blog: Controlling AI Spend w/ AppNet+agentgateway#5698therealmitchconnors wants to merge 14 commits intoAzure:masterfrom
Conversation
|
Note to self: need to update parameters to point to AppNet control plane, not OSS istio... |
There was a problem hiding this comment.
Pull request overview
This PR adds a new Docusaurus blog post describing a platform-layer pattern to control shared AI quota/spend by combining Azure Kubernetes Application Network (AppNet) identity (mTLS) with agentgateway token-based rate limiting.
Changes:
- Adds a new blog post under
website/blog/2026-04-09-appnet-agentgateway/. - Documents an architecture and example manifests for per-application token rate limiting.
- Includes an example validation flow showing success (200) and throttling (429).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| domain: token-budgets | ||
| ``` | ||
|
|
||
| Finally, let's configure our Rate Limiting Server to deny traffic after 100 tokens per application per minute (in reality, we'd need a much bigger budget, but this low budget lets us easily demo exceeding the rate limiter). |
There was a problem hiding this comment.
The narrative says '100 tokens per application per minute', but the example ConfigMap enforces requests_per_unit: 100 (requests/minute). Either update the text to say 'requests' (and adjust other 'token' references accordingly), or update the example to reflect actual token-based limiting if the rate limit server supports it. As written, this is internally inconsistent and could mislead readers trying to reproduce token-based control.
There was a problem hiding this comment.
This is not actually inconsistent. The requests_per_unit: 100 is part of the Envoy ratelimit service ConfigMap schema.
The magic happens in the AgentgatewayPolicy, where we set unit: Tokens. It tells agentgateway to report the LLM token count as the hit count to the ratelimit service. So when a request consumes 28 tokens, agentgateway tells the ratelimit service "this request used 28 hits" instead of the default 1. The ratelimit service is unaware it's counting tokens — it just does counter math against the 100 budget in Redis.
colinmixonn
left a comment
There was a problem hiding this comment.
some suggestions on readability, flow and making the value prop evident in the intro paragraph
Co-authored-by: colinmixonn <109253437+colinmixonn@users.noreply.github.com>
| - key: client_ns | ||
| rate_limit: | ||
| unit: minute | ||
| requests_per_unit: 100 | ||
| kind: ConfigMap |
There was a problem hiding this comment.
There’s a mismatch between the narrative (“deny traffic after 100 tokens per application per minute”) and the sample rate limit config, which uses requests_per_unit: 100 (requests) rather than tokens. Either adjust the prose/examples to consistently talk about requests, or update the configuration examples to reflect token-based limiting if that’s what the implementation supports.
colinmixonn
left a comment
There was a problem hiding this comment.
a few nits on abbreviation, but approved
Co-authored-by: colinmixonn <109253437+colinmixonn@users.noreply.github.com>
| socials: | ||
| linkedin: zhewei-hu | ||
| github: zheweihu | ||
|
|
There was a problem hiding this comment.
There’s trailing whitespace on the otherwise blank line after the zhewei-hu author entry. This can cause noisy diffs and may trip YAML/style linters; make the line truly empty or remove it.
| tags: [application-network, ai] | ||
| --- | ||
|
|
||
| ## Control AI spend with per-application token rate limiting using Application Network and agentgateway |
There was a problem hiding this comment.
The blog title (from the frontmatter) will be displayed on the page by default, so no need to include here.
This is the blog equivalent of the Azure booth demo at Kubeon EU 26. Highlights capabilities of the newly launched AppNet, and a "better together" story with agentgateway. Ideally timed around release of agentgateway 1.1 around April 8.