feat(text-metrics): split text_score pair#647
feat(text-metrics): split text_score pair#647davidberenstein1957 wants to merge 1 commit intofeat/vlm-pr-3b-oneig-alignmentfrom
Conversation
Adds text_score and oneig_text_score metrics together with shared OCR text utilities and benchmark wiring for Long Text Bench and OneIG Text Rendering. Made-with: Cursor
2627d78 to
c51653e
Compare
3cdc2bb to
946b068
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 946b068. Configure here.
| """ | ||
| out = text or "" | ||
| for keyword in _OCR_HALLUCINATION_KEYWORDS: | ||
| out = out.replace(keyword, "").replace(f"\n{keyword}", "").replace(f"{keyword}\n", "") |
There was a problem hiding this comment.
Hallucination keyword replacement order creates dead code
Low Severity
In clean_oneig_ocr_hallucinations, the chained .replace() calls have the wrong order. The first call out.replace(keyword, "") removes all occurrences of keyword, so the subsequent .replace(f"\n{keyword}", "") and .replace(f"{keyword}\n", "") can never find a match — they are dead code. The intent was to cleanly remove adjacent newlines together with the keyword, but the current order leaves orphan newlines behind instead. The practical impact is mitigated by downstream whitespace normalization in preprocess_string_oneig, but the logic is inverted from its clear intent.
Reviewed by Cursor Bugbot for commit 946b068. Configure here.


Summary
Splits the
text_scorepair into a dedicated stacked PR, adds OCR-based text metrics and shared utilities, and wires Long Text Bench + OneIG Text Rendering.Stack Position
feat/vlm-pr-3b-oneig-alignment)feat/vlm-pr-3d-oneig-reasoning)feat/vlm-pr-5-e2e-tests)feat/metrics-vlm-support)Files
src/pruna/evaluation/metrics/metric_text_score.pysrc/pruna/evaluation/metrics/metric_text_score_utils.pysrc/pruna/evaluation/benchmarks.pyTest Plan
Review Focus
Review Flow (Order)
Review the stack in this exact order:
This PR in the flow (5/10)