feat(vendor): add LLM2Vec embedding model#637
feat(vendor): add LLM2Vec embedding model#637davidberenstein1957 wants to merge 2 commits intomainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit c933c4d. Configure here.
| self.pooling_mode = pooling_mode | ||
| self.skip_instruction = skip_instruction | ||
| self.max_length = max_length | ||
| self.doc_max_length = 512 |
There was a problem hiding this comment.
Constructor ignores doc_max_length parameter, hardcodes 512
High Severity
self.doc_max_length is hardcoded to 512 instead of using the doc_max_length parameter. Any custom value passed to __init__ (including values loaded from llm2vec_config.json via from_pretrained) is silently discarded. This affects _convert_to_str, which uses self.doc_max_length for document length truncation, and save, which persists the (always-512) value back to config.
Reviewed by Cursor Bugbot for commit c933c4d. Configure here.
| all_embeddings = torch.cat(all_embeddings, dim=0) | ||
| all_embeddings = all_embeddings[np.argsort(length_sorted_idx)] | ||
| all_embeddings = all_embeddings.to(torch.float32) | ||
| return all_embeddings |
There was a problem hiding this comment.
convert_to_numpy parameter is accepted but never applied
Medium Severity
The convert_to_numpy parameter in encode() is documented as returning a NumPy array when True, but the return path always calls torch.cat, indexes with np.argsort, and calls .to(torch.float32) — unconditionally returning a torch.Tensor. The _encode helper also accepts convert_to_numpy but never uses it. A caller passing convert_to_tensor=False, convert_to_numpy=True will silently receive a tensor instead of the expected numpy.ndarray.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c933c4d. Configure here.
- Add LLM2Vec from OneIG vendor source - Includes Llama encoder and bidirectional models - Self-contained, no dependencies on Pruna internals - Licensed under Apache 2.0
Patch two upstream llm2vec behavior bugs found in review so downstream VLM metrics use caller-provided doc_max_length and can return numpy when requested. Document Pruna's vendor deviations in NOTICE for traceability. Co-authored-by: Cursor <cursoragent@cursor.com>
f89b047 to
fb6d967
Compare


Summary
Vendors OneIG's
LLM2Vecimplementation used by downstream OneIG-based VLM metrics.This PR includes two minimal vendor compatibility fixes:
doc_max_lengthconvert_to_numpy=Trueinencode()Stack Position
mainfeat/vlm-pr-2-infrastructure)feat/vlm-pr-5-e2e-tests)feat/metrics-vlm-support)Files
src/pruna/evaluation/metrics/vendor/NOTICE.oneig_llm2vecsrc/pruna/evaluation/metrics/vendor/oneig_llm2vec/llm2vec.pysrc/pruna/evaluation/metrics/vendor/oneig_llm2vec/modeling_llama_encoder.pysrc/pruna/evaluation/metrics/vendor/oneig_llm2vec/models/bidirectional_llama.pyTest Plan
python -c "from pruna.evaluation.metrics.vendor.oneig_llm2vec.llm2vec import LLM2Vec; print('OK')"Review Focus
Review Flow (Order)
Review the stack in this exact order:
This PR in the flow (1/10)