Switch desktop backend from Gemini 3 Pro to Flash by atlas-agent-omi[bot] · Pull Request #6312 · BasedHardware/omi

atlas-agent-omi · 2026-04-04T09:46:41Z

One-line change: gemini-3-pro-preview → gemini-3-flash-preview in the desktop Rust backend LLM client.

Why: Gemini 3 Pro is the #1 GCP cost at $12.4K/month for 144 desktop users ($86/user). Flash handles the same conversation processing tasks (structure, action items, memories, KG extraction) at ~92% lower cost.

Savings: ~$11.4K/month

Risk: Minimal — Flash is highly capable for structured extraction tasks. The rate limiter already downgrades some Pro requests to Flash (rate_limit.rs:200), so Flash is already proven in this pipeline.

Saves ~$11.4K/month. Same conversation processing tasks (structure, action items, memories, knowledge graph) — Flash handles them equally well at ~92% lower cost. Co-authored-by: Aarav Garg <aaravgarg@users.noreply.github.com>

atlas-agent-omi · 2026-04-04T09:46:42Z

@greptileai

greptile-apps · 2026-04-04T09:50:16Z

Greptile Summary

This PR swaps the default Gemini model in the Rust backend's LlmClient from gemini-3-pro-preview to gemini-3-flash-preview — a one-line change aimed at reducing GCP LLM costs by ~92% for the desktop's server-side conversation processing pipeline (structure extraction, memories, action items, and knowledge graph extraction).

Key points:

The change is minimal and correct — the model string is used directly in the Gemini API URL at client.rs:247, so the swap is immediately effective.
The LlmClient calls the Gemini API directly (not through the proxy route at /v1/proxy/gemini/*), so this change is entirely independent of the rate-limiter degradation logic in rate_limit.rs.
The PR description's claim that "Flash is already proven in this pipeline via the rate limiter" is not quite accurate: the rate limiter's maybe_rewrite_model_path only matches gemini-pro-latest (the Swift client's model over the proxy path), not gemini-3-pro-preview (the Rust backend's direct API calls). Flash has not previously handled the Rust backend structured-extraction tasks through that degradation path — though it is still a reasonable model choice for these tasks.
No tests, benchmarks, or quality comparisons for the structured extraction tasks are included; post-deploy monitoring of extraction quality would be prudent.

Confidence Score: 4/5

Safe to merge with low technical risk; primary concern is unvalidated quality impact on structured extraction tasks, not correctness of the code change itself.

The code change is a trivially correct one-liner with no logic errors. The main risk is output quality degradation on structured tasks (memories, action items, KG extraction) since Flash is less capable than Pro, and no quality benchmarks are provided. The PR description's validation argument (rate limiter already uses Flash) doesn't fully apply to this code path, so there is no prior evidence of Flash's performance for these specific Rust backend tasks.

No files require special attention beyond desktop/Backend-Rust/src/llm/client.rs — the change is isolated to the model name in the LlmClient constructor.

Important Files Changed

Filename	Overview
desktop/Backend-Rust/src/llm/client.rs	Single-line model swap from `gemini-3-pro-preview` to `gemini-3-flash-preview` in the LlmClient constructor; no logic changes. Note that this code path calls the Gemini API directly and is independent of the proxy rate-limiter degradation path.

Sequence Diagram

sequenceDiagram
    participant SwiftClient as Swift Desktop Client
    participant RustProxy as Rust Backend (proxy.rs)
    participant RateLimiter as GeminiRateLimiter (rate_limit.rs)
    participant RustLLM as Rust LlmClient (client.rs)
    participant GeminiAPI as Gemini API

    Note over SwiftClient,GeminiAPI: Path 1 — Swift client requests (NOT changed by this PR)
    SwiftClient->>RustProxy: POST /v1/proxy/gemini/models/gemini-pro-latest:generateContent
    RustProxy->>RateLimiter: check_and_record(uid)
    RateLimiter-->>RustProxy: Allow / DegradeToFlash / Reject
    Note over RustProxy: maybe_rewrite_model_path()<br/>gemini-pro-latest → gemini-3-flash-preview<br/>(only when DegradeToFlash)
    RustProxy->>GeminiAPI: Forward request (possibly rewritten model)
    GeminiAPI-->>RustProxy: Response
    RustProxy-->>SwiftClient: Response

    Note over RustLLM,GeminiAPI: Path 2 — Rust backend direct calls (CHANGED by this PR)
    RustLLM->>GeminiAPI: POST .../models/gemini-3-flash-preview:generateContent
    Note over RustLLM: Direct call — bypasses proxy and rate limiter<br/>Used for: structure extraction, memories,<br/>action items, KG extraction
    GeminiAPI-->>RustLLM: JSON response

_{Reviews (1): Last reviewed commit: "Switch desktop conversation processing f..." | Re-trigger Greptile}

greptile-apps · 2026-04-04T09:50:20Z

desktop/Backend-Rust/src/llm/client.rs

            client: Client::new(),
            api_key,
-            model: "gemini-3-pro-preview".to_string(),
+            model: "gemini-3-flash-preview".to_string(),


Rate-limiter "Flash already proven" justification doesn't apply here

The PR description states Flash is already validated in this pipeline because rate_limit.rs:200 downgrades Pro → Flash. However, that rewrite only matches models/gemini-pro-latest: (used by the Swift client going through /v1/proxy/gemini/*), not models/gemini-3-pro-preview: (used by this LlmClient, which calls the Gemini API directly and never goes through the proxy).

These are two independent request paths:

Proxy path (proxy.rs): Swift client → gemini-pro-latest → rate limiter may rewrite to gemini-3-flash-preview

Direct path (client.rs): Rust backend → gemini-3-pro-preview → Gemini API (no proxy, no rate limiter)

So gemini-3-flash-preview has not previously run the Rust backend's structured extraction tasks (memories, action items, KG extraction) via the rate limiter degradation path. Flash is still a reasonable choice for these tasks, but the validation argument in the description is for a different flow. Worth keeping in mind when evaluating output quality after rollout.

greptile-apps bot reviewed Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch desktop backend from Gemini 3 Pro to Flash#6312

Switch desktop backend from Gemini 3 Pro to Flash#6312
atlas-agent-omi[bot] wants to merge 1 commit intomainfrom
atlas/gemini-pro-to-flash

atlas-agent-omi bot commented Apr 4, 2026

Uh oh!

atlas-agent-omi bot commented Apr 4, 2026

Uh oh!

greptile-apps bot commented Apr 4, 2026

Uh oh!

greptile-apps bot Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

atlas-agent-omi bot commented Apr 4, 2026

Uh oh!

atlas-agent-omi bot commented Apr 4, 2026

Uh oh!

greptile-apps bot commented Apr 4, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant