feat: implement loglikelihood and loglikelihood_rolling for LiteLLMClient (closes #1093) by ALI-AL-MARJANI · Pull Request #1244 · huggingface/lighteval

ALI-AL-MARJANI · 2026-05-21T19:47:02Z

Summary

Implements loglikelihood() and loglikelihood_rolling() for LiteLLMClient,
enabling deterministic MCQ benchmarks (MMLU, ARC, HellaSwag) and perplexity
evaluation over any LiteLLM-supported provider.

Previously both methods raised NotImplementedError.

How it works

Uses litellm.atext_completion with echo=True, logprobs=1, max_tokens=1, temperature=0.0 (the /v1/completions endpoint). A two-layer Token Alignment
Engine isolates the continuation log-probabilities from the echoed prompt:

Layer 1 — character-offset alignment via text_offset (OpenAI exact)
Layer 2 — tiktoken token-count fallback for other providers

Concurrency is managed with asyncio.Semaphore + asyncio.gather, matching
the approach used by other async-capable backends.

Provider requirement

The /v1/completions endpoint with echo support is required:

Provider	Supported
OpenAI `gpt-3.5-turbo-instruct`	✅
Any OpenAI-compatible local server (vLLM, llama.cpp)	✅
OpenAI chat-only models (gpt-4o, gpt-4-turbo)	❌
Anthropic, Gemini, Cohere	❌

A warning is emitted at runtime if the model is registered as mode=chat.

Changes

litellm_model.py — 9 new methods: loglikelihood, loglikelihood_rolling,
async pipeline, token alignment engine, argmax check, provider guard, length guard
model_input.py — new to_litellm_text_completion_dict() method; also fixes
presence_penalty silently dropped from to_litellm_dict() (bug fix)
inference_providers_model.py — informative NotImplementedError messages
explaining why the HF Inference Providers backend cannot support this
use-litellm-as-backend.mdx — full documentation of both evaluation modes
installation.mdx, models.mdx — updated to reflect new capabilities
tests.yaml — adds --extra litellm to CI
litellm_completion_model.yaml — example config for MCQ/perplexity benchmarks

Tests

80 new unit tests, all passing, no new regressions:

Token alignment engine (6 tests)
Argmax checker (7 tests)
Async API caller with retry/backoff (7 tests)
Single-doc loglikelihood processor (7 tests)
End-to-end loglikelihood integration (4 tests)
Provider guard (12 tests)
Rolling loglikelihood (6 tests + 2 integration)
Length guard (7 tests)
to_litellm_text_completion_dict (9 tests)
to_litellm_dict presence_penalty regression (4 tests)
greedy_until split iteration regression (2 tests)

Ali322O added 2 commits May 21, 2026 21:00

implement loglikelihood for LiteLLM backend

033d906

regression coverage

ab0d2db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement loglikelihood and loglikelihood_rolling for LiteLLMClient (closes #1093)#1244

feat: implement loglikelihood and loglikelihood_rolling for LiteLLMClient (closes #1093)#1244
ALI-AL-MARJANI wants to merge 2 commits into
huggingface:mainfrom
ALI-AL-MARJANI:feature/litellm-loglikelihood-support

ALI-AL-MARJANI commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ALI-AL-MARJANI commented May 21, 2026

Summary

How it works

Provider requirement

Changes

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant