Replace the LLM OCR summarizer with direct OCR-hygiene filtering#499
Merged
Conversation
ScreenshotContextGenerator now filters raw OCR through the pure OCRTextHygiene (confidence, replacement-char, symbol-density, digit-substitution, word-char-ratio drops, plus field-text stripping and bounding) and injects the cleaned text, instead of running a second llama generation to summarize it. Removes the per-refresh generation (latency) and a hallucination layer; a base model conditions fine on cleaned raw context. Deletes LlamaVisualContextSummarizer, VisualContextSummaryPromptRenderer, and their test; drops the summarizer injection in CotabbyAppEnvironment. Adds OCRTextHygiene + tests. Follow-ups left small: runtime summarize() is now unused (kept out of this PR), the .summarizingText status is no longer emitted, and ScreenTextExtractor still discards per-line confidence (confidence filter is currently a no-op).
Comment on lines
+117
to
+123
| /// The "lowercase before" + "letter after" pairing is deliberately narrow so genuine tokens | ||
| /// survive: | ||
| /// - trailing digits (`utf8`, `v2`): no letter after the digit. | ||
| /// - leading digits (`3D`, `5070`): no letter before the digit at all. | ||
| /// - hyphenated counts (`20-core`): the digits have no lowercase letter before them. | ||
| /// - ALL-CAPS identifiers (`RTX5070`, `N1X`): the letters before the digit are uppercase, so | ||
| /// the "lowercase before" condition is never met (uppercase model/product codes are real). |
Contributor
There was a problem hiding this comment.
The digit-substitution rule (lowercase-before + letter-after) also matches common developer numeronyms:
k8s (k → lowercase before 8 → s after), e2e, i18n, a11y, w3c, b2b. Entire lines containing these tokens will be silently dropped from autocomplete context. A developer working on Kubernetes infra or an i18n codebase would find that all relevant screen context is stripped. The PR description acknowledges this family of false-positives for gpt-4o-mini but these short numeronyms are a more common case not covered by the tests.
Suggested change
| /// The "lowercase before" + "letter after" pairing is deliberately narrow so genuine tokens | |
| /// survive: | |
| /// - trailing digits (`utf8`, `v2`): no letter after the digit. | |
| /// - leading digits (`3D`, `5070`): no letter before the digit at all. | |
| /// - hyphenated counts (`20-core`): the digits have no lowercase letter before them. | |
| /// - ALL-CAPS identifiers (`RTX5070`, `N1X`): the letters before the digit are uppercase, so | |
| /// the "lowercase before" condition is never met (uppercase model/product codes are real). | |
| /// The "lowercase before" + "letter after" pairing is deliberately narrow so genuine tokens | |
| /// survive: | |
| /// - trailing digits (`utf8`, `v2`): no letter after the digit. | |
| /// - leading digits (`3D`, `5070`): no letter before the digit at all. | |
| /// - hyphenated counts (`20-core`): the digits have no lowercase letter before them. | |
| /// - ALL-CAPS identifiers (`RTX5070`, `N1X`): the letters before the digit are uppercase, so | |
| /// the "lowercase before" condition is never met (uppercase model/product codes are real). | |
| /// | |
| /// Known false positives (same pattern as OCR misreads): numeronyms such as `k8s`, `e2e`, | |
| /// `i18n`, `a11y`, `w3c`, and model names like `gpt-4o-mini`. Lines containing these will be | |
| /// dropped. This is an accepted trade-off; adding an allow-list is a potential follow-up. |
FuJacob
added a commit
that referenced
this pull request
Jun 1, 2026
FuJacob
added a commit
that referenced
this pull request
Jun 1, 2026
Follow-up to #499 (which removed the OCR summarizer): deletes the now-unused runtime code it relied on - LlamaRuntimeManager.summarize, LlamaRuntimeCore.summarize, and the orphaned LlamaGenerationOptions.summary factory. No remaining callers; build-for-testing green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Drops the LLM visual-context summarizer in favor of direct OCR hygiene.
ScreenshotContextGeneratornow filters raw OCR through the new pureOCRTextHygiene(low-confidence, replacement-char, symbol-density, digit-substitution, and word-char-ratio drops, plus field-text stripping and bounding) and injects the cleaned text, instead of running a second on-device generation to summarize it.Why: the summarizer cost an extra llama generation on every visual-context refresh (latency) and was a hallucination layer between the screen and the prompt (a likely source of off-context ghost text). A base model conditions fine on cleaned raw context, so the summary is unnecessary. Field-text stripping also stops the screen context from echoing the user's own typed text back into the prompt.
Removes
LlamaVisualContextSummarizer,VisualContextSummaryPromptRenderer, and their test; drops the summarizer injection inCotabbyAppEnvironment. AddsOCRTextHygiene+ thorough tests.Validation
OCRTextHygienehas unit coverage for each filter (notably the digit-substitution matrix:qu81ity/h3llodrop;utf8,v2,3D,RTX5070,20-corepass), symbol-density, field-text stripping, andcleanbounding. App-hosted tests run in CI (local Team ID signing blocks the xctest bundle, as before).Risk / rollout notes
LlamaRuntimeManager.summarize/LlamaRuntimeCore.summarizeare now unused (only the deleted summarizer called them); left in place for a separate trivial cleanup.VisualContextStatus.summarizingTextcase is no longer emitted but kept so the debug-overlay's exhaustive switches don't churn.ScreenTextExtractorstill discards per-line OCRconfidence, so the confidence filter is currently a no-op; surfacing confidence (and optionally.fastrecognition) is a follow-up.🤖 Generated with Claude Code
Greptile Summary
Replaces the
LlamaVisualContextSummarizer+VisualContextSummaryPromptRendererstack with a pure-SwiftOCRTextHygienepass that filters noisy OCR lines (low-confidence, replacement characters, symbol density, digit substitution, word-char ratio, field-text echoes) before injecting them into the autocomplete prompt. This eliminates the extra on-device LLM generation per visual-context refresh and removes a hallucination layer between the screen and the prompt.OCRTextHygieneis a stateless, I/O-free enum with six independently testable filter stages and thorough unit coverage; all thresholds are tunable.ScreenshotContextGeneratoris simplified to a linear hygiene → normalize → bound pipeline; theVisualContextSummarizingprotocol,resolvedContextText, and the summarizer injection inCotabbyAppEnvironmentare all removed.Confidence Score: 4/5
Safe to merge; the change is a clean removal of the on-device summarizer path and a well-tested replacement filter. No data loss or behavioral regression risks on the existing feature boundary.
The OCR hygiene filters are well-reasoned and the test suite covers each one individually and end-to-end. The digit-substitution heuristic (lowercase-before + letter-after) quietly drops lines containing numeronyms like k8s, e2e, i18n, and a11y — real developer terminology that is indistinguishable from OCR misreads by this rule. The PR acknowledges gpt-4o-mini as a known false positive but does not document or test the shorter numeronym family, which is more likely to appear in on-screen text for the target developer audience. This is a quality regression for those specific contexts rather than a hard breakage.
Cotabby/Support/OCRTextHygiene.swift — specifically the tokenHasDigitSubstitution predicate and its interaction with numeronyms.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[FocusedInputSnapshot] --> B[captureScreenshot] B --> C[ScreenTextExtractor.extractText] C -->|ScreenTextExtractionError.noRecognizedText| D[Use window title fallback] C -->|extractedText| E[OCRTextHygiene.clean] E --> E1[Filter 1: dropLowConfidence] E1 --> E2[Filter 2: dropReplacementCharacter] E2 --> E3[Filter 3: dropHighSymbolDensity] E3 --> E4[Filter 4: dropDigitSubstitution] E4 --> E5[Filter 5: dropLowWordCharacterRatio] E5 --> E6[Filter 6: strip field-text echoes] E6 --> F[normalizeRecognizedText / sanitizeOCR] F --> G[boundedSummaryText / sanitize] G --> H{hasMeaningfulSignal?} H -->|No| I[throw .unavailable] H -->|Yes| J[VisualContextExcerpt] D --> K[normalizeRecognizedText] K --> L{hasMeaningfulSignal?} L -->|No| I L -->|Yes| JComments Outside Diff (1)
Cotabby/Services/Visual/ScreenshotContextGenerator.swift, line 9-10 (link)Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Reviews (1): Last reviewed commit: "Replace the LLM OCR summarizer with dire..." | Re-trigger Greptile