feat(text-metrics): split text_score pair by davidberenstein1957 · Pull Request #647 · PrunaAI/pruna

davidberenstein1957 · 2026-04-28T13:04:01Z

Summary

Splits the text_score pair into a dedicated stacked PR, adds OCR-based text metrics and shared utilities, and wires Long Text Bench + OneIG Text Rendering.

Stack Position

Base: PR feat(text-metrics): split oneig_alignment #646 (feat/vlm-pr-3b-oneig-alignment)
Next: PR feat(text-metrics): split oneig_reasoning #648 (feat/vlm-pr-3d-oneig-reasoning)
Final integration: PR feat(e2e-tests): stacked e2e after split metrics #641 (feat/vlm-pr-5-e2e-tests)
Canonical umbrella reference: PR feat(evaluation): add VLMMetrics #545 (feat/metrics-vlm-support)

Files

src/pruna/evaluation/metrics/metric_text_score.py
src/pruna/evaluation/metrics/metric_text_score_utils.py
src/pruna/evaluation/benchmarks.py

Test Plan

uv run pytest tests/evaluation/test_text_metrics.py -k text_score

Review Focus

OCR scoring behavior
Long Text Bench and OneIG Text Rendering wiring

Review Flow (Order)

Review the stack in this exact order:

feat(vendor): add LLM2Vec embedding model #637 vendor
feat(infrastructure): add VLM base classes and utilities #638 infrastructure
feat(text-metrics): split qa_accuracy #645 qa_accuracy
feat(text-metrics): split oneig_alignment #646 oneig_alignment
feat(text-metrics): split text_score pair #647 text_score pair
feat(text-metrics): split oneig_reasoning #648 oneig_reasoning
feat(vision-metrics): split vqa #649 vqa
feat(vision-metrics): split vie_score #650 vie_score
feat(vision-metrics): split img_edit_score #651 img_edit_score
feat(e2e-tests): stacked e2e after split metrics #641 e2e tests

This PR in the flow (5/10)

Review after PR feat(text-metrics): split oneig_alignment #646.
Next PR to review: feat(text-metrics): split oneig_reasoning #648.
Confirm this PR's tests and scope before continuing.

Adds text_score and oneig_text_score metrics together with shared OCR text utilities and benchmark wiring for Long Text Bench and OneIG Text Rendering. Made-with: Cursor

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 946b068. Configure here.}

cursor · 2026-05-08T09:08:23Z

+    """
+    out = text or ""
+    for keyword in _OCR_HALLUCINATION_KEYWORDS:
+        out = out.replace(keyword, "").replace(f"\n{keyword}", "").replace(f"{keyword}\n", "")


Hallucination keyword replacement order creates dead code

Low Severity

In clean_oneig_ocr_hallucinations, the chained .replace() calls have the wrong order. The first call out.replace(keyword, "") removes all occurrences of keyword, so the subsequent .replace(f"\n{keyword}", "") and .replace(f"{keyword}\n", "") can never find a match — they are dead code. The intent was to cleanly remove adjacent newlines together with the keyword, but the current order leaves orphan newlines behind instead. The practical impact is mitigated by downstream whitespace normalization in preprocess_string_oneig, but the logic is inverted from its clear intent.

^{Reviewed by Cursor Bugbot for commit 946b068. Configure here.}

This was referenced Apr 28, 2026

feat(text-metrics): add text-based VLM judge metrics #639

Closed

feat(vision-metrics): add vision-based VLM judge metrics #640

Closed

feat(text-metrics): split text_score pair into dedicated branch

946b068

Adds text_score and oneig_text_score metrics together with shared OCR text utilities and benchmark wiring for Long Text Bench and OneIG Text Rendering. Made-with: Cursor

davidberenstein1957 force-pushed the feat/vlm-pr-3b-oneig-alignment branch from 2627d78 to c51653e Compare May 8, 2026 09:01

davidberenstein1957 force-pushed the feat/vlm-pr-3c-text-score-pair branch from 3cdc2bb to 946b068 Compare May 8, 2026 09:01

cursor Bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(text-metrics): split text_score pair#647

feat(text-metrics): split text_score pair#647
davidberenstein1957 wants to merge 1 commit intofeat/vlm-pr-3b-oneig-alignmentfrom
feat/vlm-pr-3c-text-score-pair

davidberenstein1957 commented Apr 28, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidberenstein1957 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stack Position

Files

Test Plan

Review Focus

Review Flow (Order)

This PR in the flow (5/10)

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 8, 2026

Choose a reason for hiding this comment

Hallucination keyword replacement order creates dead code

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davidberenstein1957 commented Apr 28, 2026 •

edited

Loading