feat(vendor): add LLM2Vec embedding model by davidberenstein1957 · Pull Request #637 · PrunaAI/pruna

davidberenstein1957 · 2026-04-25T12:52:02Z

Summary

Vendors OneIG's LLM2Vec implementation used by downstream OneIG-based VLM metrics.

This PR includes two minimal vendor compatibility fixes:

honor constructor-provided doc_max_length
honor convert_to_numpy=True in encode()

Stack Position

Base: main
Next: PR feat(infrastructure): add VLM base classes and utilities #638 (feat/vlm-pr-2-infrastructure)
Final integration: PR feat(e2e-tests): stacked e2e after split metrics #641 (feat/vlm-pr-5-e2e-tests)
Canonical umbrella reference: PR feat(evaluation): add VLMMetrics #545 (feat/metrics-vlm-support)

Files

src/pruna/evaluation/metrics/vendor/NOTICE.oneig_llm2vec
src/pruna/evaluation/metrics/vendor/oneig_llm2vec/llm2vec.py
src/pruna/evaluation/metrics/vendor/oneig_llm2vec/modeling_llama_encoder.py
src/pruna/evaluation/metrics/vendor/oneig_llm2vec/models/bidirectional_llama.py

Test Plan

python -c "from pruna.evaluation.metrics.vendor.oneig_llm2vec.llm2vec import LLM2Vec; print('OK')"

Review Focus

License and attribution completeness
Correctness of the two vendor compatibility fixes
No coupling to Pruna internals

Review Flow (Order)

Review the stack in this exact order:

feat(vendor): add LLM2Vec embedding model #637 vendor
feat(infrastructure): add VLM base classes and utilities #638 infrastructure
feat(text-metrics): split qa_accuracy #645 qa_accuracy
feat(text-metrics): split oneig_alignment #646 oneig_alignment
feat(text-metrics): split text_score pair #647 text_score pair
feat(text-metrics): split oneig_reasoning #648 oneig_reasoning
feat(vision-metrics): split vqa #649 vqa
feat(vision-metrics): split vie_score #650 vie_score
feat(vision-metrics): split img_edit_score #651 img_edit_score
feat(e2e-tests): stacked e2e after split metrics #641 e2e tests

This PR in the flow (1/10)

Start here (base of the stack).
Next PR to review: feat(infrastructure): add VLM base classes and utilities #638.
Confirm this PR's tests and scope before continuing.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit c933c4d. Configure here.}

cursor · 2026-04-25T12:58:48Z

+        self.pooling_mode = pooling_mode
+        self.skip_instruction = skip_instruction
+        self.max_length = max_length
+        self.doc_max_length = 512


Constructor ignores doc_max_length parameter, hardcodes 512

High Severity

self.doc_max_length is hardcoded to 512 instead of using the doc_max_length parameter. Any custom value passed to __init__ (including values loaded from llm2vec_config.json via from_pretrained) is silently discarded. This affects _convert_to_str, which uses self.doc_max_length for document length truncation, and save, which persists the (always-512) value back to config.

^{Reviewed by Cursor Bugbot for commit c933c4d. Configure here.}

cursor · 2026-04-25T12:58:49Z

+        all_embeddings = torch.cat(all_embeddings, dim=0)
+        all_embeddings = all_embeddings[np.argsort(length_sorted_idx)]
+        all_embeddings = all_embeddings.to(torch.float32)
+        return all_embeddings


convert_to_numpy parameter is accepted but never applied

Medium Severity

The convert_to_numpy parameter in encode() is documented as returning a NumPy array when True, but the return path always calls torch.cat, indexes with np.argsort, and calls .to(torch.float32) — unconditionally returning a torch.Tensor. The _encode helper also accepts convert_to_numpy but never uses it. A caller passing convert_to_tensor=False, convert_to_numpy=True will silently receive a tensor instead of the expected numpy.ndarray.

Additional Locations (1)

src/pruna/evaluation/metrics/vendor/oneig_llm2vec/llm2vec.py#L486-L506

^{Reviewed by Cursor Bugbot for commit c933c4d. Configure here.}

- Add LLM2Vec from OneIG vendor source - Includes Llama encoder and bidirectional models - Self-contained, no dependencies on Pruna internals - Licensed under Apache 2.0

Patch two upstream llm2vec behavior bugs found in review so downstream VLM metrics use caller-provided doc_max_length and can return numpy when requested. Document Pruna's vendor deviations in NOTICE for traceability. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Apr 25, 2026

View reviewed changes

davidberenstein1957 and others added 2 commits May 8, 2026 11:01

feat(vendor): add LLM2Vec embedding model

8a0faab

- Add LLM2Vec from OneIG vendor source - Includes Llama encoder and bidirectional models - Self-contained, no dependencies on Pruna internals - Licensed under Apache 2.0

davidberenstein1957 force-pushed the feat/vlm-pr-1-vendor branch from f89b047 to fb6d967 Compare May 8, 2026 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vendor): add LLM2Vec embedding model#637

feat(vendor): add LLM2Vec embedding model#637
davidberenstein1957 wants to merge 2 commits intomainfrom
feat/vlm-pr-1-vendor

davidberenstein1957 commented Apr 25, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 25, 2026

Uh oh!

cursor Bot Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidberenstein1957 commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stack Position

Files

Test Plan

Review Focus

Review Flow (Order)

This PR in the flow (1/10)

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 25, 2026

Choose a reason for hiding this comment

Constructor ignores doc_max_length parameter, hardcodes 512

Uh oh!

cursor Bot Apr 25, 2026

Choose a reason for hiding this comment

convert_to_numpy parameter is accepted but never applied

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davidberenstein1957 commented Apr 25, 2026 •

edited

Loading

Constructor ignores `doc_max_length` parameter, hardcodes 512

`convert_to_numpy` parameter is accepted but never applied