Replace the LLM OCR summarizer with direct OCR-hygiene filtering by FuJacob · Pull Request #499 · FuJacob/cotabby

FuJacob · 2026-06-01T07:49:33Z

Summary

Drops the LLM visual-context summarizer in favor of direct OCR hygiene. ScreenshotContextGenerator now filters raw OCR through the new pure OCRTextHygiene (low-confidence, replacement-char, symbol-density, digit-substitution, and word-char-ratio drops, plus field-text stripping and bounding) and injects the cleaned text, instead of running a second on-device generation to summarize it.

Why: the summarizer cost an extra llama generation on every visual-context refresh (latency) and was a hallucination layer between the screen and the prompt (a likely source of off-context ghost text). A base model conditions fine on cleaned raw context, so the summary is unnecessary. Field-text stripping also stops the screen context from echoing the user's own typed text back into the prompt.

Removes LlamaVisualContextSummarizer, VisualContextSummaryPromptRenderer, and their test; drops the summarizer injection in CotabbyAppEnvironment. Adds OCRTextHygiene + thorough tests.

Validation

xcodebuild build-for-testing ... -derivedDataPath build/DerivedData   # ** BUILD SUCCEEDED **
swiftlint lint --config .swiftlint.yml --quiet                        # changed files, exit 0

OCRTextHygiene has unit coverage for each filter (notably the digit-substitution matrix: qu81ity/h3llo drop; utf8, v2, 3D, RTX5070, 20-core pass), symbol-density, field-text stripping, and clean bounding. App-hosted tests run in CI (local Team ID signing blocks the xctest bundle, as before).

Risk / rollout notes

Visual context is unchanged in when it runs (off the keystroke path, off by default, secure-field gated); only what it produces changes: cleaned raw OCR rather than a model summary.
Deferred follow-ups (intentionally small, to keep this PR out of the runtime actor):
- LlamaRuntimeManager.summarize / LlamaRuntimeCore.summarize are now unused (only the deleted summarizer called them); left in place for a separate trivial cleanup.
- The VisualContextStatus.summarizingText case is no longer emitted but kept so the debug-overlay's exhaustive switches don't churn.
- ScreenTextExtractor still discards per-line OCR confidence, so the confidence filter is currently a no-op; surfacing confidence (and optionally .fast recognition) is a follow-up.

🤖 Generated with Claude Code

Greptile Summary

Replaces the LlamaVisualContextSummarizer + VisualContextSummaryPromptRenderer stack with a pure-Swift OCRTextHygiene pass that filters noisy OCR lines (low-confidence, replacement characters, symbol density, digit substitution, word-char ratio, field-text echoes) before injecting them into the autocomplete prompt. This eliminates the extra on-device LLM generation per visual-context refresh and removes a hallucination layer between the screen and the prompt.

OCRTextHygiene is a stateless, I/O-free enum with six independently testable filter stages and thorough unit coverage; all thresholds are tunable.
ScreenshotContextGenerator is simplified to a linear hygiene → normalize → bound pipeline; the VisualContextSummarizing protocol, resolvedContextText, and the summarizer injection in CotabbyAppEnvironment are all removed.

Confidence Score: 4/5

Safe to merge; the change is a clean removal of the on-device summarizer path and a well-tested replacement filter. No data loss or behavioral regression risks on the existing feature boundary.

The OCR hygiene filters are well-reasoned and the test suite covers each one individually and end-to-end. The digit-substitution heuristic (lowercase-before + letter-after) quietly drops lines containing numeronyms like k8s, e2e, i18n, and a11y — real developer terminology that is indistinguishable from OCR misreads by this rule. The PR acknowledges gpt-4o-mini as a known false positive but does not document or test the shorter numeronym family, which is more likely to appear in on-screen text for the target developer audience. This is a quality regression for those specific contexts rather than a hard breakage.

Cotabby/Support/OCRTextHygiene.swift — specifically the tokenHasDigitSubstitution predicate and its interaction with numeronyms.

Important Files Changed

Filename	Overview
Cotabby/Support/OCRTextHygiene.swift	New pure-Swift OCR hygiene module with 6 filters; well-structured and tested, but the digit-substitution filter (filter #4) silently drops lines containing numeronyms like k8s, e2e, i18n, a11y — cases not covered by tests or the PR description
Cotabby/Services/Visual/ScreenshotContextGenerator.swift	Summarizer path cleanly removed; replaced with OCRTextHygiene.clean pipeline; hardcoded confidence of 1.0 is an acknowledged no-op; file-level comment still references "optional local summary" which no longer exists
Cotabby/App/Core/CotabbyAppEnvironment.swift	Cleanly drops the summarizer injection; ScreenshotContextGenerator now takes no summarizer argument
CotabbyTests/OCRTextHygieneTests.swift	Thorough per-filter unit tests with a preserve/drop matrix for digit-substitution; correctly documents the gpt-4o-mini limitation but does not cover numeronym false-positive cases
Cotabby.xcodeproj/project.pbxproj	Correctly removes deleted files and adds new OCRTextHygiene files to both app and test targets

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[FocusedInputSnapshot] --> B[captureScreenshot]
    B --> C[ScreenTextExtractor.extractText]
    C -->|ScreenTextExtractionError.noRecognizedText| D[Use window title fallback]
    C -->|extractedText| E[OCRTextHygiene.clean]
    E --> E1[Filter 1: dropLowConfidence]
    E1 --> E2[Filter 2: dropReplacementCharacter]
    E2 --> E3[Filter 3: dropHighSymbolDensity]
    E3 --> E4[Filter 4: dropDigitSubstitution]
    E4 --> E5[Filter 5: dropLowWordCharacterRatio]
    E5 --> E6[Filter 6: strip field-text echoes]
    E6 --> F[normalizeRecognizedText / sanitizeOCR]
    F --> G[boundedSummaryText / sanitize]
    G --> H{hasMeaningfulSignal?}
    H -->|No| I[throw .unavailable]
    H -->|Yes| J[VisualContextExcerpt]
    D --> K[normalizeRecognizedText]
    K --> L{hasMeaningfulSignal?}
    L -->|No| I
    L -->|Yes| J

Comments Outside Diff (1)

Cotabby/Services/Visual/ScreenshotContextGenerator.swift, line 9-10 (link)

The file-level pipeline comment still references "optional local summary" which was removed by this PR. The summarizer step no longer exists.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "Replace the LLM OCR summarizer with dire..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

ScreenshotContextGenerator now filters raw OCR through the pure OCRTextHygiene (confidence, replacement-char, symbol-density, digit-substitution, word-char-ratio drops, plus field-text stripping and bounding) and injects the cleaned text, instead of running a second llama generation to summarize it. Removes the per-refresh generation (latency) and a hallucination layer; a base model conditions fine on cleaned raw context. Deletes LlamaVisualContextSummarizer, VisualContextSummaryPromptRenderer, and their test; drops the summarizer injection in CotabbyAppEnvironment. Adds OCRTextHygiene + tests. Follow-ups left small: runtime summarize() is now unused (kept out of this PR), the .summarizingText status is no longer emitted, and ScreenTextExtractor still discards per-line confidence (confidence filter is currently a no-op).

greptile-apps · 2026-06-01T08:00:20Z

+    /// The "lowercase before" + "letter after" pairing is deliberately narrow so genuine tokens
+    /// survive:
+    /// - trailing digits (`utf8`, `v2`): no letter after the digit.
+    /// - leading digits (`3D`, `5070`): no letter before the digit at all.
+    /// - hyphenated counts (`20-core`): the digits have no lowercase letter before them.
+    /// - ALL-CAPS identifiers (`RTX5070`, `N1X`): the letters before the digit are uppercase, so
+    ///   the "lowercase before" condition is never met (uppercase model/product codes are real).


The digit-substitution rule (lowercase-before + letter-after) also matches common developer numeronyms: k8s (k → lowercase before 8 → s after), e2e, i18n, a11y, w3c, b2b. Entire lines containing these tokens will be silently dropped from autocomplete context. A developer working on Kubernetes infra or an i18n codebase would find that all relevant screen context is stripped. The PR description acknowledges this family of false-positives for gpt-4o-mini but these short numeronyms are a more common case not covered by the tests.

Suggested change

/// The "lowercase before" + "letter after" pairing is deliberately narrow so genuine tokens

/// survive:

/// - trailing digits (`utf8`, `v2`): no letter after the digit.

/// - leading digits (`3D`, `5070`): no letter before the digit at all.

/// - hyphenated counts (`20-core`): the digits have no lowercase letter before them.

/// - ALL-CAPS identifiers (`RTX5070`, `N1X`): the letters before the digit are uppercase, so

/// the "lowercase before" condition is never met (uppercase model/product codes are real).

/// The "lowercase before" + "letter after" pairing is deliberately narrow so genuine tokens

/// survive:

/// - trailing digits (`utf8`, `v2`): no letter after the digit.

/// - leading digits (`3D`, `5070`): no letter before the digit at all.

/// - hyphenated counts (`20-core`): the digits have no lowercase letter before them.

/// - ALL-CAPS identifiers (`RTX5070`, `N1X`): the letters before the digit are uppercase, so

/// the "lowercase before" condition is never met (uppercase model/product codes are real).

///

/// Known false positives (same pattern as OCR misreads): numeronyms such as `k8s`, `e2e`,

/// `i18n`, `a11y`, `w3c`, and model names like `gpt-4o-mini`. Lines containing these will be

/// dropped. This is an accepted trade-off; adding an allow-list is a potential follow-up.

…500)

Follow-up to #499 (which removed the OCR summarizer): deletes the now-unused runtime code it relied on - LlamaRuntimeManager.summarize, LlamaRuntimeCore.summarize, and the orphaned LlamaGenerationOptions.summary factory. No remaining callers; build-for-testing green.

FuJacob merged commit ecf6b0f into main Jun 1, 2026
3 of 4 checks passed

FuJacob mentioned this pull request Jun 1, 2026

Fix CotabbyTests build broken by #499 (summarizer test scaffolding) #500

Merged

greptile-apps Bot reviewed Jun 1, 2026

View reviewed changes

FuJacob added a commit that referenced this pull request Jun 1, 2026

Fix CotabbyTests build broken by #499 (summarizer test scaffolding) (#…

78124f6

…500)

FuJacob mentioned this pull request Jun 1, 2026

Remove the dead visual-context summarize path #501

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace the LLM OCR summarizer with direct OCR-hygiene filtering#499

Replace the LLM OCR summarizer with direct OCR-hygiene filtering#499
FuJacob merged 1 commit into
mainfrom
experimental/m2-ocr-hygiene

FuJacob commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

greptile-apps Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FuJacob commented Jun 1, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Risk / rollout notes

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

greptile-apps Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading