Skip to content

feat(vision-metrics): split vqa#649

Open
davidberenstein1957 wants to merge 1 commit intofeat/vlm-pr-3d-oneig-reasoningfrom
feat/vlm-pr-4a-vqa
Open

feat(vision-metrics): split vqa#649
davidberenstein1957 wants to merge 1 commit intofeat/vlm-pr-3d-oneig-reasoningfrom
feat/vlm-pr-4a-vqa

Conversation

@davidberenstein1957
Copy link
Copy Markdown
Member

@davidberenstein1957 davidberenstein1957 commented Apr 28, 2026

Summary

Splits vqa into its own stacked PR, adds VQAMetric, and wires GenAI Bench benchmark entry with focused VQA coverage.

Stack Position

Files

  • src/pruna/evaluation/metrics/metric_vqa.py
  • src/pruna/evaluation/benchmarks.py
  • tests/evaluation/test_vision_metrics.py

Test Plan

uv run pytest tests/evaluation/test_vision_metrics.py -k vqa

Review Focus

  • Yes-probability scoring
  • GenAI Bench mapping

Review Flow (Order)

Review the stack in this exact order:

  1. feat(vendor): add LLM2Vec embedding model #637 vendor
  2. feat(infrastructure): add VLM base classes and utilities #638 infrastructure
  3. feat(text-metrics): split qa_accuracy #645 qa_accuracy
  4. feat(text-metrics): split oneig_alignment #646 oneig_alignment
  5. feat(text-metrics): split text_score pair #647 text_score pair
  6. feat(text-metrics): split oneig_reasoning #648 oneig_reasoning
  7. feat(vision-metrics): split vqa #649 vqa
  8. feat(vision-metrics): split vie_score #650 vie_score
  9. feat(vision-metrics): split img_edit_score #651 img_edit_score
  10. feat(e2e-tests): stacked e2e after split metrics #641 e2e tests

This PR in the flow (7/10)

Introduces VQAMetric with GenAI Bench benchmark wiring and focused VQA unit coverage as the first vision metric stack PR.

Made-with: Cursor
@davidberenstein1957 davidberenstein1957 force-pushed the feat/vlm-pr-3d-oneig-reasoning branch from e586366 to c853dd2 Compare May 8, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant