Switch desktop backend from Gemini 3 Pro to Flash#6312
Switch desktop backend from Gemini 3 Pro to Flash#6312atlas-agent-omi[bot] wants to merge 1 commit intomainfrom
Conversation
Saves ~$11.4K/month. Same conversation processing tasks (structure, action items, memories, knowledge graph) — Flash handles them equally well at ~92% lower cost. Co-authored-by: Aarav Garg <aaravgarg@users.noreply.github.com>
Greptile SummaryThis PR swaps the default Gemini model in the Rust backend's Key points:
Confidence Score: 4/5Safe to merge with low technical risk; primary concern is unvalidated quality impact on structured extraction tasks, not correctness of the code change itself. The code change is a trivially correct one-liner with no logic errors. The main risk is output quality degradation on structured tasks (memories, action items, KG extraction) since Flash is less capable than Pro, and no quality benchmarks are provided. The PR description's validation argument (rate limiter already uses Flash) doesn't fully apply to this code path, so there is no prior evidence of Flash's performance for these specific Rust backend tasks. No files require special attention beyond Important Files Changed
Sequence DiagramsequenceDiagram
participant SwiftClient as Swift Desktop Client
participant RustProxy as Rust Backend (proxy.rs)
participant RateLimiter as GeminiRateLimiter (rate_limit.rs)
participant RustLLM as Rust LlmClient (client.rs)
participant GeminiAPI as Gemini API
Note over SwiftClient,GeminiAPI: Path 1 — Swift client requests (NOT changed by this PR)
SwiftClient->>RustProxy: POST /v1/proxy/gemini/models/gemini-pro-latest:generateContent
RustProxy->>RateLimiter: check_and_record(uid)
RateLimiter-->>RustProxy: Allow / DegradeToFlash / Reject
Note over RustProxy: maybe_rewrite_model_path()<br/>gemini-pro-latest → gemini-3-flash-preview<br/>(only when DegradeToFlash)
RustProxy->>GeminiAPI: Forward request (possibly rewritten model)
GeminiAPI-->>RustProxy: Response
RustProxy-->>SwiftClient: Response
Note over RustLLM,GeminiAPI: Path 2 — Rust backend direct calls (CHANGED by this PR)
RustLLM->>GeminiAPI: POST .../models/gemini-3-flash-preview:generateContent
Note over RustLLM: Direct call — bypasses proxy and rate limiter<br/>Used for: structure extraction, memories,<br/>action items, KG extraction
GeminiAPI-->>RustLLM: JSON response
Reviews (1): Last reviewed commit: "Switch desktop conversation processing f..." | Re-trigger Greptile |
| client: Client::new(), | ||
| api_key, | ||
| model: "gemini-3-pro-preview".to_string(), | ||
| model: "gemini-3-flash-preview".to_string(), |
There was a problem hiding this comment.
Rate-limiter "Flash already proven" justification doesn't apply here
The PR description states Flash is already validated in this pipeline because rate_limit.rs:200 downgrades Pro → Flash. However, that rewrite only matches models/gemini-pro-latest: (used by the Swift client going through /v1/proxy/gemini/*), not models/gemini-3-pro-preview: (used by this LlmClient, which calls the Gemini API directly and never goes through the proxy).
These are two independent request paths:
- Proxy path (
proxy.rs): Swift client →gemini-pro-latest→ rate limiter may rewrite togemini-3-flash-preview - Direct path (
client.rs): Rust backend →gemini-3-pro-preview→ Gemini API (no proxy, no rate limiter)
So gemini-3-flash-preview has not previously run the Rust backend's structured extraction tasks (memories, action items, KG extraction) via the rate limiter degradation path. Flash is still a reasonable choice for these tasks, but the validation argument in the description is for a different flow. Worth keeping in mind when evaluating output quality after rollout.
One-line change:
gemini-3-pro-preview→gemini-3-flash-previewin the desktop Rust backend LLM client.Why: Gemini 3 Pro is the #1 GCP cost at $12.4K/month for 144 desktop users ($86/user). Flash handles the same conversation processing tasks (structure, action items, memories, KG extraction) at ~92% lower cost.
Savings: ~$11.4K/month
Risk: Minimal — Flash is highly capable for structured extraction tasks. The rate limiter already downgrades some Pro requests to Flash (
rate_limit.rs:200), so Flash is already proven in this pipeline.