LCORE-1723: Cross-Encoder Reranking for enhanced RAG#1566
LCORE-1723: Cross-Encoder Reranking for enhanced RAG#1566Anxhela21 wants to merge 13 commits intolightspeed-core:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a configurable cross-encoder reranker for RAG: new RerankerConfiguration, AppConfig.reranker accessor, sentence-transformers runtime dependency, BYOK rerank boost constant, cross-encoder lazy loading and async reranking integrated into build_rag_context, plus related unit and integration test updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Builder as build_rag_context
participant BYOK as _fetch_byok_chunks
participant Solr as _fetch_solr_chunks
participant Reranker as CrossEncoderReranker
participant Booster as BYOKBooster
participant Selector as SelectTopK
participant Context as BuildRAGContext
Client->>Builder: query + configuration
Builder->>BYOK: fetch candidates (increased pool)
BYOK-->>Builder: byok_chunks
Builder->>Solr: fetch candidates (increased pool)
Solr-->>Builder: solr_chunks
Builder->>Builder: merge & dedupe candidates
alt configuration.reranker.enabled
Builder->>Reranker: score (query, chunk.content) pairs
Reranker-->>Builder: normalized CE scores
Builder->>Builder: combine CE (30%) + original (70%)
Builder->>Booster: apply BYOK_RAG_RERANK_BOOST to non-OKP BYOK chunks
Booster-->>Builder: boosted_scores
Builder->>Selector: sort & select top K
else
Builder->>Selector: sort by original chunk.score & select top K
end
Selector-->>Context: final_chunks
Context-->>Client: RAGContext (with referenced_documents derived from final_chunks)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
e6b6d66 to
880a566
Compare
0131555 to
82caaf6
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Around line 75-79: Remove the duplicate dependency entry for
"sentence-transformers>=2.0.0" in pyproject.toml (the repeated listing with the
same comment), leaving only a single declaration; also check the llslibdev group
requirement (">=5.0.0") and confirm whether the runtime floor should be raised
to ">=5.0.0" to match required features, updating the single remaining
"sentence-transformers" entry accordingly if the code needs 5.0.0+.
In `@src/constants.py`:
- Around line 195-196: Add a Final[float] type annotation to the new constant
BYOK_RAG_RERANK_BOOST so it matches the other constants' immutability checks;
update the declaration for BYOK_RAG_RERANK_BOOST to use Final[float] and ensure
Final is imported from typing at the top of the file if it isn't already.
In `@src/models/config.py`:
- Around line 1828-1866: The current PrivateAttr(_explicitly_configured: bool =
PrivateAttr(default=False)) makes _explicitly_configured always present and the
value-comparison in mark_as_explicitly_configured() misses cases where a user
explicitly provided a field equal to its default; change _explicitly_configured
to a tri-state (e.g. PrivateAttr(default=None)) and rewrite
mark_as_explicitly_configured to use the Pydantic-provided
self.model_fields_set: compute the set of reranker field names you care about
(enabled, model, model_id, provider_id, top_k_multiplier, byok_boost,
okp_boost), check if any of those names are in self.model_fields_set and if so
set self._explicitly_configured = True (leave it None/False otherwise), and
update the consuming code (the hasattr(...) check elsewhere) to test for
_explicitly_configured is not None / True instead of hasattr so the auto-enable
branch can run correctly.
- Around line 1815-1826: The three RerankerConfiguration fields
(top_k_multiplier, byok_boost, okp_boost) are declared but not used; update the
reranking code to read these values from the configuration instead of using
hard-coded constants: replace the hard-coded 2x in the pool_size calculation in
vector_search.py with configuration.reranker.top_k_multiplier when computing
pool_size (or its equivalent variable), change the boost applied in
_apply_byok_rerank_boost to use configuration.reranker.byok_boost instead of
constants.BYOK_RAG_RERANK_BOOST, and ensure OKP chunks receive
configuration.reranker.okp_boost (apply the multiplier where OKP chunks
currently pass through unmodified). Keep existing defaults from the
RerankerConfiguration when reading these fields so behavior is unchanged if the
user omits them.
In `@src/utils/vector_search.py`:
- Around line 28-56: The module-level cache _cross_encoder_models and loader
_get_cross_encoder are not concurrency-safe and can instantiate
CrossEncoder(model_name) multiple times on cold start; fix by serializing loads
for a given model_name (single-flight) — e.g., add a module-level dict of
asyncio.Lock objects (or a per-name asyncio.Event/Future) and acquire the lock
for model_name inside _get_cross_encoder (or convert the loader to a synchronous
function, apply functools.cache, and call it via asyncio.to_thread) so only one
task constructs CrossEncoder(model_name) and others await the result, then store
the loaded model (or None on failure) into _cross_encoder_models before
releasing the lock; ensure any exception handling still sets the cache to None
to avoid repeated retries.
- Around line 722-728: The function build_rag_context currently takes an unused
parameter moderation_decision (annotated with # pylint:
disable=unused-argument); either remove that parameter from the function
signature and delete the pylint disable, and update its three callers (the calls
in the functions at src/app/endpoints/query.py,
src/app/endpoints/streaming_query.py, and src/app/endpoints/responses.py) to
stop passing moderation_decision, or if you intend to use it later, keep the
parameter but replace the pylint disable with a short TODO comment in
build_rag_context explaining the planned future use and why moderation_decision
is intentionally ignored; reference the function name build_rag_context and the
three caller sites when making the change so the signature and call sites remain
consistent.
- Around line 59-134: The async function _rerank_chunks_with_cross_encoder
currently calls the blocking sentence_transformers.CrossEncoder.predict
synchronously (model.predict), which will block the asyncio event loop; change
the call to run in a thread via asyncio.to_thread (e.g., scores = await
asyncio.to_thread(model.predict, pairs)) so inference executes off the event
loop, preserving async responsiveness, and keep the subsequent score processing
logic unchanged; ensure to import asyncio if not already present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 89ed94e6-8009-41d5-8e4c-3f2892881c37
📒 Files selected for processing (5)
pyproject.tomlsrc/configuration.pysrc/constants.pysrc/models/config.pysrc/utils/vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: bandit
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.12)
- GitHub Check: integration_tests (3.12)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
src/constants.pysrc/configuration.pysrc/models/config.pysrc/utils/vector_search.py
**/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Central
constants.pyfor shared constants with descriptive comments
Files:
src/constants.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/constants.pysrc/configuration.pysrc/models/config.pysrc/utils/vector_search.py
src/**/config*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/config*.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic models
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Use type hints likeOptional[FilePath],PositiveInt,SecretStrin Pydantic models
Files:
src/configuration.pysrc/models/config.py
🧠 Learnings (6)
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
src/constants.pysrc/utils/vector_search.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/config*.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
src/models/config.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/*.py : Pydantic models extend `ConfigurationBase` for config, `BaseModel` for data models
Applied to files:
src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.
Applied to files:
src/utils/vector_search.py
🔇 Additional comments (3)
src/utils/vector_search.py (1)
722-746:pool_size/top_kignore the configuredtop_k_multiplier.
pool_size = 2 * constants.BYOK_RAG_MAX_CHUNKShard-codes the 2× factor, butRerankerConfiguration.top_k_multiplier(default 2.0) is intended to govern this exact value. Likewise,_apply_byok_rerank_boost(...)reads the multiplier fromconstants.BYOK_RAG_RERANK_BOOST, ignoringreranker.byok_boostandreranker.okp_boost. See the comment onsrc/models/config.pyRerankerConfigurationfor the full picture and a suggested wiring diff.src/configuration.py (1)
469-475: LGTM.The new
rerankerproperty follows the same pattern as the surroundingrag/okpproperties. Sincererankeris a non-optional field with adefault_factoryinConfiguration, the access is safe once_configurationis loaded.src/models/config.py (1)
1822-1823: Remove unused placeholder fields or wire them up.
model_idandprovider_idinRerankerConfigurationare not consumed bysrc/utils/vector_search.py(onlymodelandenabledare read). If these fields are reserved for future Llama Stack-backed reranker provider support, they should either be wired up or removed to avoid leaking unimplemented schema to users. The Pydantic v2 protected namespace concern does not apply here—the project uses Pydantic v2.10.6+, where the defaultprotected_namespacessetting only flags direct conflicts with methods likemodel_validateandmodel_dump, not fields with themodel_prefix.> Likely an incorrect or invalid review comment.
cce6b82 to
d664680
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/vector_search.py (1)
642-685:⚠️ Potential issue | 🟠 MajorThe OKP fetch path never actually expands the rerank candidate pool.
_fetch_solr_rag()acceptsmax_chunks, but the value never reachesvector_io.query()because_build_query_params(solr)still uses the defaultk.build_rag_context()also calls_fetch_solr_rag()withoutmax_chunks=pool_size. As a result, OKP stays capped at the default retrieval size, so the reranker never sees the larger Solr pool described in this PR.♻️ Suggested fix
- params = _build_query_params(solr) + params = _build_query_params(solr, k=limit)- solr_chunks_task = _fetch_solr_rag(client, query, solr) + solr_chunks_task = _fetch_solr_rag( + client, query, solr, max_chunks=pool_size + )Also applies to: 748-755
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 642 - 685, The Solr fetch never expands the rerank candidate pool because _fetch_solr_rag ignores its max_chunks arg and _build_query_params still uses the default k; also build_rag_context doesn't pass pool_size into _fetch_solr_rag. Fix by threading max_chunks through: update _fetch_solr_rag to pass max_chunks (or computed limit) into the params returned by _build_query_params (ensure _build_query_params accepts a k/limit argument and sets params['k'] accordingly), call client.vector_io.query with those params, and modify build_rag_context to call _fetch_solr_rag(..., max_chunks=pool_size) so the Solr query and vector_io.query receive the larger pool_size for reranking (refer to functions _fetch_solr_rag, _build_query_params, build_rag_context, vector_io.query, and variables max_chunks/pool_size).
♻️ Duplicate comments (3)
src/utils/vector_search.py (2)
79-90:⚠️ Potential issue | 🔴 CriticalOffload
CrossEncoder.predict()off the event loop.
_rerank_chunks_with_cross_encoder()is async, butmodel.predict()is a synchronous inference call. Awaiting this path under load will stall the event loop for the full rerank duration and block unrelated requests.♻️ Minimal fix
- model = _get_cross_encoder(model_name) + model = await _get_cross_encoder(model_name) @@ - scores = model.predict(pairs) + scores = await asyncio.to_thread(model.predict, pairs)According to the official sentence-transformers documentation, is `CrossEncoder.predict(...)` a synchronous/blocking call, and is `asyncio.to_thread(...)` the appropriate way to invoke it from Python async code?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 79 - 90, _rerank_chunks_with_cross_encoder is async but calls the synchronous CrossEncoder.predict (via model.predict) which blocks the event loop; change the call to run the blocking inference off the loop using asyncio.to_thread (or loop.run_in_executor) so prediction runs in a worker thread and returns awaitably. Locate the call to model.predict in _rerank_chunks_with_cross_encoder and replace the direct call with an awaited asyncio.to_thread(lambda: model.predict(pairs)) (or equivalent) so the CrossEncoder.predict invocation is executed off the event loop.
34-56:⚠️ Potential issue | 🟠 MajorSerialize cold-start loads for each cross-encoder model.
This cache is still racy: concurrent requests can all miss
_cross_encoder_modelsand instantiate the same model before the first load stores it, which is an expensive CPU/RAM spike for large rerankers. Wrap model construction in a per-model single-flight lock so only one task loads each model name.♻️ Suggested single-flight pattern
_cross_encoder_models: dict[str, Any] = {} # pylint: disable=invalid-name +_cross_encoder_load_locks: dict[str, asyncio.Lock] = {} # pylint: disable=invalid-name -def _get_cross_encoder(model_name: str) -> Any: +async def _get_cross_encoder(model_name: str) -> Any: """Return the lazy-loaded cross-encoder model for reranking. @@ - if model_name not in _cross_encoder_models: - try: - from sentence_transformers import ( # pylint: disable=import-outside-toplevel - CrossEncoder, - ) - - _cross_encoder_models[model_name] = CrossEncoder(model_name) - logger.info("Loaded cross-encoder for RAG reranking: %s", model_name) - except Exception as e: # pylint: disable=broad-exception-caught - logger.warning( - "Could not load cross-encoder for reranking (%s): %s", model_name, e - ) - _cross_encoder_models[model_name] = None + if model_name in _cross_encoder_models: + return _cross_encoder_models[model_name] + + lock = _cross_encoder_load_locks.setdefault(model_name, asyncio.Lock()) + async with lock: + if model_name in _cross_encoder_models: + return _cross_encoder_models[model_name] + try: + from sentence_transformers import ( # pylint: disable=import-outside-toplevel + CrossEncoder, + ) + + _cross_encoder_models[model_name] = await asyncio.to_thread( + CrossEncoder, model_name + ) + logger.info("Loaded cross-encoder for RAG reranking: %s", model_name) + except Exception as e: # pylint: disable=broad-exception-caught + logger.warning( + "Could not load cross-encoder for reranking (%s): %s", model_name, e + ) + _cross_encoder_models[model_name] = None return _cross_encoder_models[model_name]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 34 - 56, The _get_cross_encoder function is racy because multiple callers can simultaneously miss _cross_encoder_models and each instantiate the same heavy model; fix it by introducing a per-model single-flight lock (e.g., a module-level dict _cross_encoder_locks: Dict[str, threading.Lock]) and wrap the check-and-create logic for a given model_name inside that lock: obtain or create the lock for model_name (use setdefault), acquire it (with context manager) before re-checking _cross_encoder_models[model_name], then instantiate CrossEncoder and assign to _cross_encoder_models[model_name] inside the locked region (ensure exceptions still set the cache to None and release the lock in finally). Update references to _get_cross_encoder, _cross_encoder_models, and logger accordingly.src/models/config.py (1)
1829-1861:⚠️ Potential issue | 🟠 Major
_explicitly_configurednever makes the auto-enable branch work as intended.
PrivateAttr(default=False)means_explicitly_configuredis always present, sonot hasattr(self.reranker, "_explicitly_configured")is always false and the auto-enable path never runs. The value-comparison validator also misses explicit input that matches defaults, e.g.reranker: { enabled: true }. Usemodel_fields_setto mark explicit input and test the flag value directly.♻️ Suggested fix
class RerankerConfiguration(ConfigurationBase): @@ _explicitly_configured: bool = PrivateAttr(default=False) @@ `@model_validator`(mode="after") def mark_as_explicitly_configured(self) -> Self: """Mark this configuration as explicitly set when instantiated from user input.""" - # Only mark as explicitly configured if we're not using all default values - # This allows auto-enabling when user hasn't touched reranker settings - # Check if any field differs from default values - default_model = "cross-encoder/ms-marco-MiniLM-L6-v2" - default_top_k_multiplier = 2.0 - default_byok_boost = 1.2 - default_okp_boost = 1.0 - - # Check if any setting differs from defaults (indicates explicit configuration) - current_values = [ - self.enabled, - self.model, - self.top_k_multiplier, - self.byok_boost, - self.okp_boost, - ] - default_values = [ - True, - default_model, - default_top_k_multiplier, - default_byok_boost, - default_okp_boost, - ] - - if current_values != default_values: + if self.model_fields_set: self._explicitly_configured = True return self @@ - if ( - has_byok - and has_okp - and not hasattr(self.reranker, "_explicitly_configured") - ): + if has_byok and has_okp and not self.reranker._explicitly_configured: # pylint: disable=no-member if not self.reranker.enabled: logger.info( "Automatically enabling reranker: Both BYOK RAG (%d entries) and OKP " "are configured. Reranking improves result quality when multiple "In Pydantic v2, do `PrivateAttr(default=...)` attributes always exist on model instances, and does `model_fields_set` track fields explicitly provided in the input even when the provided value equals the field default?Also applies to: 2154-2168
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/config.py` around lines 1829 - 1861, The _explicitly_configured flag is never meaningful because PrivateAttr always exists; update the mark_as_explicitly_configured validator (and the similar validator at lines ~2154-2168) to use model_fields_set to detect which fields were explicitly provided and set self._explicitly_configured = True when any reranker-related fields (e.g., "enabled", "model", "top_k_multiplier", "byok_boost", "okp_boost") appear in self.model_fields_set or when the parent model passed reranker in its input; also change any checks that use hasattr(self.reranker, "_explicitly_configured") to inspect the flag's value directly (self.reranker._explicitly_configured) so the auto-enable branch can run when the user explicitly provided values even if they match defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/vector_search.py`:
- Around line 182-205: The loop that builds ReferencedDocument from rag_chunks
currently appends entries even when both title and URL are missing; update the
logic in the loop (the section that computes attrs, doc_url, parsed_url and
before result.append) to skip/continue when both attrs.get("title") is falsy and
parsed_url is falsy/None so you only append ReferencedDocument when there is a
doc_title or a doc_url; keep existing dedup_key/seen handling and AnyUrl parsing
but add a guard (using the computed title and parsed_url) to avoid creating
empty ReferencedDocument objects.
---
Outside diff comments:
In `@src/utils/vector_search.py`:
- Around line 642-685: The Solr fetch never expands the rerank candidate pool
because _fetch_solr_rag ignores its max_chunks arg and _build_query_params still
uses the default k; also build_rag_context doesn't pass pool_size into
_fetch_solr_rag. Fix by threading max_chunks through: update _fetch_solr_rag to
pass max_chunks (or computed limit) into the params returned by
_build_query_params (ensure _build_query_params accepts a k/limit argument and
sets params['k'] accordingly), call client.vector_io.query with those params,
and modify build_rag_context to call _fetch_solr_rag(..., max_chunks=pool_size)
so the Solr query and vector_io.query receive the larger pool_size for reranking
(refer to functions _fetch_solr_rag, _build_query_params, build_rag_context,
vector_io.query, and variables max_chunks/pool_size).
---
Duplicate comments:
In `@src/models/config.py`:
- Around line 1829-1861: The _explicitly_configured flag is never meaningful
because PrivateAttr always exists; update the mark_as_explicitly_configured
validator (and the similar validator at lines ~2154-2168) to use
model_fields_set to detect which fields were explicitly provided and set
self._explicitly_configured = True when any reranker-related fields (e.g.,
"enabled", "model", "top_k_multiplier", "byok_boost", "okp_boost") appear in
self.model_fields_set or when the parent model passed reranker in its input;
also change any checks that use hasattr(self.reranker, "_explicitly_configured")
to inspect the flag's value directly (self.reranker._explicitly_configured) so
the auto-enable branch can run when the user explicitly provided values even if
they match defaults.
In `@src/utils/vector_search.py`:
- Around line 79-90: _rerank_chunks_with_cross_encoder is async but calls the
synchronous CrossEncoder.predict (via model.predict) which blocks the event
loop; change the call to run the blocking inference off the loop using
asyncio.to_thread (or loop.run_in_executor) so prediction runs in a worker
thread and returns awaitably. Locate the call to model.predict in
_rerank_chunks_with_cross_encoder and replace the direct call with an awaited
asyncio.to_thread(lambda: model.predict(pairs)) (or equivalent) so the
CrossEncoder.predict invocation is executed off the event loop.
- Around line 34-56: The _get_cross_encoder function is racy because multiple
callers can simultaneously miss _cross_encoder_models and each instantiate the
same heavy model; fix it by introducing a per-model single-flight lock (e.g., a
module-level dict _cross_encoder_locks: Dict[str, threading.Lock]) and wrap the
check-and-create logic for a given model_name inside that lock: obtain or create
the lock for model_name (use setdefault), acquire it (with context manager)
before re-checking _cross_encoder_models[model_name], then instantiate
CrossEncoder and assign to _cross_encoder_models[model_name] inside the locked
region (ensure exceptions still set the cache to None and release the lock in
finally). Update references to _get_cross_encoder, _cross_encoder_models, and
logger accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 32b1995a-8892-4d14-a537-53427a6d3620
📒 Files selected for processing (7)
pyproject.tomlsrc/configuration.pysrc/constants.pysrc/models/config.pysrc/utils/vector_search.pytests/unit/models/config/test_reranker_configuration.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: integration_tests (3.12)
- GitHub Check: integration_tests (3.13)
- GitHub Check: Pylinter
- GitHub Check: mypy
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
src/constants.pysrc/configuration.pytests/unit/models/config/test_reranker_configuration.pysrc/models/config.pytests/unit/utils/test_vector_search.pysrc/utils/vector_search.py
**/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Central
constants.pyfor shared constants with descriptive comments
Files:
src/constants.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/constants.pysrc/configuration.pysrc/models/config.pysrc/utils/vector_search.py
src/**/config*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/config*.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic models
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Use type hints likeOptional[FilePath],PositiveInt,SecretStrin Pydantic models
Files:
src/configuration.pysrc/models/config.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Usepytest-mockfor AsyncMock objects in tests
Use markerpytest.mark.asynciofor async tests
Unit tests require 60% coverage, integration tests 10%
Files:
tests/unit/models/config/test_reranker_configuration.pytests/unit/utils/test_vector_search.py
🧠 Learnings (9)
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Use Final[type] as type hint for all constants
Applied to files:
src/constants.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/constants.py : Central `constants.py` for shared constants with descriptive comments
Applied to files:
src/constants.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
src/constants.pysrc/models/config.pytests/unit/utils/test_vector_search.pysrc/utils/vector_search.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Always check `pyproject.toml` for existing dependencies before adding new ones
Applied to files:
pyproject.toml
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/config*.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
tests/unit/models/config/test_reranker_configuration.pysrc/models/config.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/*.py : Pydantic models extend `ConfigurationBase` for config, `BaseModel` for data models
Applied to files:
src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.
Applied to files:
src/utils/vector_search.py
are-ces
left a comment
There was a problem hiding this comment.
The example config in the PR description must be outdated because
top_k_multiplier: 2.0 # Fetch 2x chunks, rerank, keep top_k
byok_boost: 1.2 # Boost factor for BYOK sources
okp_boost: 1.0 # Boost factor for OKP sources
are missing from the configuration in the PR.
Would be nice to see a configured reranker in the example configuration examples/lightspeed-stack-byok-okp-rag.yaml.
Please update unit and integration tests; fix Black and Linting failing tests. TY!
| # To be able to fix multiple CVEs, also LCORE-1117 | ||
| "requests>=2.33.0", | ||
| # Used for RAG chunk reranking (cross-encoder) | ||
| "sentence-transformers>=2.0.0", |
There was a problem hiding this comment.
yes, it needs to be updated to pass CI etc.
There was a problem hiding this comment.
thanks missed that - will update uv.lock
There was a problem hiding this comment.
I think this line is redundant and not needed since it is already defined in llslibdev below.
| # To be able to fix multiple CVEs, also LCORE-1117 | ||
| "requests>=2.33.0", | ||
| # Used for RAG chunk reranking (cross-encoder) | ||
| "sentence-transformers>=2.0.0", |
There was a problem hiding this comment.
yes, it needs to be updated to pass CI etc.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (6)
src/utils/vector_search.py (5)
87-89:⚠️ Potential issue | 🟠 MajorRun
predict()in a worker thread.
CrossEncoder.predict()is synchronous. Calling it directly insideasync def _rerank_chunks_with_cross_encoderblocks the event loop for the full inference time.♻️ Proposed fix
# Create query-chunk pairs for scoring pairs = [(query, chunk.content) for chunk in chunks] - scores = model.predict(pairs) + scores = await asyncio.to_thread(model.predict, pairs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 87 - 89, The call to the synchronous CrossEncoder.predict inside async def _rerank_chunks_with_cross_encoder blocks the event loop; run model.predict in a background thread using asyncio's loop.run_in_executor (or asyncio.to_thread) instead. Locate _rerank_chunks_with_cross_encoder and replace the direct scores = model.predict(pairs) call with an awaitable that executes model.predict in a worker thread, preserving the same input pairs and returning scores for the rest of the function to use. Ensure exceptions from model.predict propagate or are handled appropriately when run in the executor.
701-722:⚠️ Potential issue | 🟠 MajorPass
max_chunksthrough to the Solr query.
limitis computed here, butvector_io.querystill uses_build_query_params(solr)withoutk=limit, so OKP retrieval remains capped atSOLR_VECTOR_SEARCH_DEFAULT_K. The expanded rerank pool is therefore applied only to BYOK.♻️ Proposed fix
- params = _build_query_params(solr) + params = _build_query_params(solr, k=limit)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 701 - 722, The code computes limit but doesn't pass it to the Solr query; update the call site in this block so the Solr query uses that limit by passing it into the query params (e.g., call _build_query_params(solr, k=limit) or set params["k"]=limit after params = _build_query_params(solr)), then pass the updated params into client.vector_io.query (the vector_io.query invocation and the _build_query_params function are the unique symbols to modify).
759-793:⚠️ Potential issue | 🔴 CriticalKeep the moderation gate before querying RAG backends.
moderation_decisionis still passed in, but this function immediately starts BYOK/OKP retrieval for every request. That sends blocked prompts to the vector stores again, which regresses the earlier safety boundary.♻️ Proposed fix
async def build_rag_context( # pylint: disable=too-many-locals client: AsyncLlamaStackClient, moderation_decision: str, # pylint: disable=unused-argument query: str, vector_store_ids: Optional[list[str]], solr: Optional[SolrVectorSearchRequest] = None, ) -> RAGContext: @@ + if moderation_decision == "blocked": + logger.info("Skipping inline RAG because moderation blocked the query") + return RAGContext( + context_text="", + rag_chunks=[], + referenced_documents=[], + ) + pool_size = 2 * constants.BYOK_RAG_MAX_CHUNKS top_k = constants.BYOK_RAG_MAX_CHUNKS🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 759 - 793, The function build_rag_context currently starts _fetch_byok_rag and _fetch_solr_rag before honoring the moderation_decision; change it to check moderation_decision at the top of build_rag_context and short-circuit when the decision indicates a blocked request (e.g., "BLOCK"/unsafe), returning an empty RAGContext (no chunks, empty context text and documents) instead of invoking _fetch_byok_rag/_fetch_solr_rag; keep the rest of the logic unchanged for non-blocked decisions.
34-56:⚠️ Potential issue | 🟠 MajorLoad the cross-encoder off the event loop and make cache misses single-flight.
CrossEncoder(model_name)can block on model load/download, and concurrent cold starts can instantiate the same model multiple times before_cross_encoder_modelsis populated.♻️ Proposed fix
_cross_encoder_models: dict[str, Any] = {} # pylint: disable=invalid-name +_cross_encoder_locks: dict[str, asyncio.Lock] = {} # pylint: disable=invalid-name -def _get_cross_encoder(model_name: str) -> Any: +async def _get_cross_encoder(model_name: str) -> Any: """Return the lazy-loaded cross-encoder model for reranking. @@ - if model_name not in _cross_encoder_models: - try: + if model_name in _cross_encoder_models: + return _cross_encoder_models[model_name] + + lock = _cross_encoder_locks.setdefault(model_name, asyncio.Lock()) + async with lock: + if model_name in _cross_encoder_models: + return _cross_encoder_models[model_name] + try: from sentence_transformers import ( # pylint: disable=import-outside-toplevel CrossEncoder, ) - _cross_encoder_models[model_name] = CrossEncoder(model_name) + _cross_encoder_models[model_name] = await asyncio.to_thread( + CrossEncoder, model_name + ) logger.info("Loaded cross-encoder for RAG reranking: %s", model_name) except Exception as e: # pylint: disable=broad-exception-caught logger.warning( "Could not load cross-encoder for reranking (%s): %s", model_name, e ) _cross_encoder_models[model_name] = None return _cross_encoder_models[model_name]Line 81 would then need to
await _get_cross_encoder(model_name).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 34 - 56, _get_cross_encoder currently constructs CrossEncoder(model_name) on the event loop and allows concurrent cache-miss races; change it to an async function that performs blocking CrossEncoder(model_name) inside an executor (run_in_executor) and implement single-flight cache-miss protection (e.g., a per-model asyncio.Lock or a dict of asyncio.Future/promises keyed by model_name) so only one loader runs per model and others await the result; populate _cross_encoder_models with the loaded model (or None on failure) and ensure callers (e.g., the caller at the noted site that must now `await _get_cross_encoder(model_name)`) are updated to await the async function.
219-241:⚠️ Potential issue | 🟠 MajorSkip referenced documents that have neither title nor URL.
This helper currently emits
ReferencedDocumententries for doc-id-only chunks. Those are intentionally filtered out elsewhere because the frontend cannot render them usefully.♻️ Proposed fix
for chunk in rag_chunks: attrs = chunk.attributes or {} doc_url = ( attrs.get("reference_url") or attrs.get("doc_url") or attrs.get("docs_url") ) + title = attrs.get("title") doc_id = attrs.get("document_id") or attrs.get("doc_id") dedup_key = doc_url or doc_id or chunk.source or "" if not dedup_key or dedup_key in seen: continue seen.add(dedup_key) @@ - result.append( + if not title and parsed_url is None: + continue + result.append( ReferencedDocument( - doc_title=attrs.get("title"), + doc_title=title, doc_url=parsed_url, source=chunk.source, ) )Based on learnings:
parse_referenced_documents_from_responses_apikeeps only documents withdoc_titleordoc_urlbecause these are displayed in the frontend.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 219 - 241, The loop over rag_chunks currently always appends a ReferencedDocument even when a chunk has neither a title nor a URL; update the logic in the function (the loop that iterates rag_chunks and constructs ReferencedDocument) to skip adding entries unless attrs.get("title") or the parsed doc_url is present (i.e., only append ReferencedDocument when at least one of doc_title or doc_url exists), keeping the existing dedup_key/seen checks and AnyUrl parsing behavior; this mirrors the behavior in parse_referenced_documents_from_responses_api so the frontend won’t receive doc-id-only entries.src/models/config.py (1)
1830-1839:⚠️ Potential issue | 🟠 MajorRespect explicit reranker settings in the auto-enable guard.
mark_as_explicitly_configured()records whether the user touchedreranker, but this validator never checks that flag. An explicitreranker.enabled: falsestill gets forced back totrue. Also, the currentor/andmix is parsed ashas_byok or (...), so any non-emptybyok_ragtrips this block before the OKP/enabled checks apply.♻️ Proposed fix
- if ( - has_byok or has_inline_rag - and has_okp - and not self.reranker.enabled - ): + if ( + (has_byok or has_inline_rag) + and has_okp + and not self.reranker._explicitly_configured # pylint: disable=protected-access + and not self.reranker.enabled + ):Also applies to: 2135-2147
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/config.py` around lines 1830 - 1839, The auto-enable guard currently ignores the _explicitly_configured flag set by mark_as_explicitly_configured() and has incorrect operator grouping so byok_rag truthiness bypasses OKP/enabled checks; update the guard that decides reranker.enabled to first respect _explicitly_configured (do not override when true) and then fix the boolean expression to use parentheses so it evaluates as has_byok or (okp_present and enabled_check) rather than has_byok or (...)—apply the same fix to the other identical guard scope referenced later (the second block around the reranker auto-enable code).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Around line 75-76: The dependency "sentence-transformers>=2.0.0" pulls a
transitive "torch>=1.11.0" at runtime, but the existing [tool.uv.sources]
override is scoped to group = "llslibdev" so runtime installs still resolve
torch from default PyPI (GPU variant); move the pytorch-cpu index override out
of the group-scoped [tool.uv.sources] and apply it globally (top-level
[tool.uv.sources]) so that package resolution for "torch" (and thus for the
transitive dependency from sentence-transformers) uses the pytorch-cpu index for
all installs.
In `@src/models/config.py`:
- Around line 2124-2125: The check "has_inline_rag = len(self.rag) > 0" will
raise TypeError because RagConfiguration (self.rag) has no __len__; replace this
by checking the inline field directly — e.g., set has_inline_rag =
bool(self.rag.inline) — so update the logic in the method/initializer where
has_inline_rag is computed (referencing self.rag and has_inline_rag) to use
bool(self.rag.inline) instead of len(self.rag).
In `@src/utils/vector_search.py`:
- Around line 818-821: The else branch currently calls
_apply_byok_rerank_boost(merged) which mutates scores and ordering even when the
reranker is disabled; change this to preserve original vector ranking and scores
by taking the top_k items directly from merged (e.g., context_chunks =
merged[:top_k]) so no scores or ordering are modified when reranker.enabled =
False; reference: _apply_byok_rerank_boost, merged, context_chunks, top_k,
reranker.enabled.
In `@tests/unit/models/config/test_reranker_configuration.py`:
- Around line 46-53: The test should assert a public observable effect instead
of the private _explicitly_configured sentinel: replace the hasattr(config,
"_explicitly_configured") check in test_explicit_configuration_detection with an
assertion that the instance's public value differs from the default (e.g.,
create a default = RerankerConfiguration() and assert config.enabled !=
default.enabled or assert config.enabled is False and default.enabled is True),
so the test verifies that providing a non-default value to RerankerConfiguration
produces the expected public behavior.
---
Duplicate comments:
In `@src/models/config.py`:
- Around line 1830-1839: The auto-enable guard currently ignores the
_explicitly_configured flag set by mark_as_explicitly_configured() and has
incorrect operator grouping so byok_rag truthiness bypasses OKP/enabled checks;
update the guard that decides reranker.enabled to first respect
_explicitly_configured (do not override when true) and then fix the boolean
expression to use parentheses so it evaluates as has_byok or (okp_present and
enabled_check) rather than has_byok or (...)—apply the same fix to the other
identical guard scope referenced later (the second block around the reranker
auto-enable code).
In `@src/utils/vector_search.py`:
- Around line 87-89: The call to the synchronous CrossEncoder.predict inside
async def _rerank_chunks_with_cross_encoder blocks the event loop; run
model.predict in a background thread using asyncio's loop.run_in_executor (or
asyncio.to_thread) instead. Locate _rerank_chunks_with_cross_encoder and replace
the direct scores = model.predict(pairs) call with an awaitable that executes
model.predict in a worker thread, preserving the same input pairs and returning
scores for the rest of the function to use. Ensure exceptions from model.predict
propagate or are handled appropriately when run in the executor.
- Around line 701-722: The code computes limit but doesn't pass it to the Solr
query; update the call site in this block so the Solr query uses that limit by
passing it into the query params (e.g., call _build_query_params(solr, k=limit)
or set params["k"]=limit after params = _build_query_params(solr)), then pass
the updated params into client.vector_io.query (the vector_io.query invocation
and the _build_query_params function are the unique symbols to modify).
- Around line 759-793: The function build_rag_context currently starts
_fetch_byok_rag and _fetch_solr_rag before honoring the moderation_decision;
change it to check moderation_decision at the top of build_rag_context and
short-circuit when the decision indicates a blocked request (e.g.,
"BLOCK"/unsafe), returning an empty RAGContext (no chunks, empty context text
and documents) instead of invoking _fetch_byok_rag/_fetch_solr_rag; keep the
rest of the logic unchanged for non-blocked decisions.
- Around line 34-56: _get_cross_encoder currently constructs
CrossEncoder(model_name) on the event loop and allows concurrent cache-miss
races; change it to an async function that performs blocking
CrossEncoder(model_name) inside an executor (run_in_executor) and implement
single-flight cache-miss protection (e.g., a per-model asyncio.Lock or a dict of
asyncio.Future/promises keyed by model_name) so only one loader runs per model
and others await the result; populate _cross_encoder_models with the loaded
model (or None on failure) and ensure callers (e.g., the caller at the noted
site that must now `await _get_cross_encoder(model_name)`) are updated to await
the async function.
- Around line 219-241: The loop over rag_chunks currently always appends a
ReferencedDocument even when a chunk has neither a title nor a URL; update the
logic in the function (the loop that iterates rag_chunks and constructs
ReferencedDocument) to skip adding entries unless attrs.get("title") or the
parsed doc_url is present (i.e., only append ReferencedDocument when at least
one of doc_title or doc_url exists), keeping the existing dedup_key/seen checks
and AnyUrl parsing behavior; this mirrors the behavior in
parse_referenced_documents_from_responses_api so the frontend won’t receive
doc-id-only entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: abc3be5b-4824-4542-9263-2d92c2a3f592
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
pyproject.tomlsrc/constants.pysrc/models/config.pysrc/utils/vector_search.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_reranker_configuration.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: build-pr
- GitHub Check: mypy
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (6)
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/models/config/test_reranker_configuration.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_vector_search.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/unit/models/config/test_reranker_configuration.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_vector_search.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/constants.pysrc/models/config.pysrc/utils/vector_search.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Central
constants.pyfor shared constants with descriptive comments
Files:
src/constants.py
src/models/config.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/config.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic config models
Files:
src/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Type hints for Pydantic models:Optional[FilePath],PositiveInt,SecretStr
Usetyping_extensions.Selffor model validators in Pydantic models
Pydantic config classes extendConfigurationBase, data models extendBaseModel
Use@model_validatorand@field_validatorfor validation in Pydantic models
Docstring Attributes section documents Pydantic model attributes
Files:
src/models/config.py
🧠 Learnings (9)
📚 Learning: 2026-04-29T15:45:22.820Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.820Z
Learning: ALWAYS check `pyproject.toml` for existing dependencies before adding new ones
Applied to files:
pyproject.toml
📚 Learning: 2026-04-29T15:45:22.820Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.820Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
tests/unit/models/config/test_reranker_configuration.pysrc/models/config.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
tests/integration/endpoints/test_query_byok_integration.pysrc/constants.pysrc/models/config.pytests/unit/utils/test_vector_search.pysrc/utils/vector_search.py
📚 Learning: 2026-04-29T15:45:22.820Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.820Z
Learning: Applies to src/constants.py : Central `constants.py` for shared constants with descriptive comments
Applied to files:
src/constants.py
📚 Learning: 2026-04-29T15:45:22.820Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.820Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants
Applied to files:
src/constants.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
Applied to files:
src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.
Applied to files:
src/utils/vector_search.py
🔇 Additional comments (1)
src/models/config.py (1)
2000-2004: Good use ofdefault_factoryfor the nested config.This matches the existing Pydantic pattern here and avoids sharing a mutable reranker instance across
Configurationobjects.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/utils/vector_search.py (1)
803-804:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t discard source-specific referenced-document metadata.
_fetch_solr_rag()already extracts referenced documents from raw OKP chunks, including titles via_extract_solr_document_metadata(), but this gather drops them and later rebuilds documents fromcontext_chunks._convert_solr_chunks_to_rag_format()only carriesdocument_id/doc_url, so the reconstructed list loses OKP titles after reranking. Keep the fetched document payloads and filter them to the surviving chunks, or propagatetitleinto the chunk attributes before rebuilding.Based on learnings:
parse_referenced_documents_from_responses_apikeeps frontend-useful referenced documents because these documents are displayed in the frontend.Also applies to: 842-843
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 803 - 804, The current asyncio.gather call discards the fetched referenced-document payloads (from _fetch_solr_rag/_extract_solr_document_metadata) and later rebuilds documents from context_chunks, losing OKP titles; modify the flow so the results returned by byok_chunks_task and solr_chunks_task preserve referenced-document payloads and their title metadata (either by: 1) having _fetch_solr_rag return the referenced documents alongside chunks and filtering that referenced-doc list to only the surviving chunk document_ids before returning, or 2) propagating title/doc metadata into each chunk's attributes before _convert_solr_chunks_to_rag_format runs), update places that call _convert_solr_chunks_to_rag_format to accept/merge the extra metadata, and mirror the behavior of parse_referenced_documents_from_responses_api to keep frontend-useful referenced documents.src/models/config.py (1)
2132-2142:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix the auto-enable predicate before it overrides user config.
has_byok or has_inline_rag and has_okp and not self.reranker.enabledis parsed ashas_byok or (...). That means any config with at least onebyok_ragentry will force this branch, even when OKP is absent and even when the user explicitly setreranker.enabled = False. Gate this withhas_byok and has_okpplus_explicitly_configured, or drop the validator entirely sinceenabled=Trueis already the default.🐛 Minimal guard fix
- if has_byok or has_inline_rag and has_okp and not self.reranker.enabled: + if ( + has_byok + and has_okp + and not self.reranker._explicitly_configured # pylint: disable=protected-access + and not self.reranker.enabled + ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/config.py` around lines 2132 - 2142, The condition currently parses incorrectly and can auto-enable the reranker whenever has_byok is true; change the predicate to require OKP plus a RAG source and only auto-enable when the reranker was not explicitly set by the user. Replace the line `if has_byok or has_inline_rag and has_okp and not self.reranker.enabled:` with a guarded check such as `if (has_okp and (has_byok or has_inline_rag)) and not self.reranker.enabled and not getattr(self.reranker, "_explicitly_configured", False):` so the branch only runs when OKP is present, there is at least one RAG source, reranker is disabled, and the user did not explicitly configure reranker; keep the same logger and assignment to `self.reranker.enabled = True`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/vector_search.py`:
- Around line 35-53: The synchronous _get_cross_encoder performs blocking
CrossEncoder(model_name) creation and should be offloaded from the event loop:
make _get_cross_encoder async (or keep it sync but perform the heavy
construction via asyncio.to_thread) and use await asyncio.to_thread(lambda:
CrossEncoder(model_name)) when populating _cross_encoder_models[model_name];
ensure callers such as _rerank_chunks_with_cross_encoder await the async loader
(or already call it from an async context) so the initial Cold-start model load
is executed off the event loop while leaving predict() threading as-is.
In `@tests/unit/utils/test_vector_search.py`:
- Around line 864-893: Change test_handles_import_error to simulate an actual
import-time failure (ImportError) instead of reusing the CrossEncoder
instantiation error: clear the _cross_encoder_models cache, then cause importing
the sentence_transformers dependency to fail (for example by using
mocker.patch.dict on sys.modules to remove or replace "sentence_transformers" or
patch importlib.import_module to raise ImportError) and reload or import
utils.vector_search so _get_cross_encoder runs the import-failure path; assert
that _get_cross_encoder("test-model") returns None. Keep the other
test_handles_model_loading_error as-is to cover instantiation errors from
CrossEncoder.
---
Duplicate comments:
In `@src/models/config.py`:
- Around line 2132-2142: The condition currently parses incorrectly and can
auto-enable the reranker whenever has_byok is true; change the predicate to
require OKP plus a RAG source and only auto-enable when the reranker was not
explicitly set by the user. Replace the line `if has_byok or has_inline_rag and
has_okp and not self.reranker.enabled:` with a guarded check such as `if
(has_okp and (has_byok or has_inline_rag)) and not self.reranker.enabled and not
getattr(self.reranker, "_explicitly_configured", False):` so the branch only
runs when OKP is present, there is at least one RAG source, reranker is
disabled, and the user did not explicitly configure reranker; keep the same
logger and assignment to `self.reranker.enabled = True`.
In `@src/utils/vector_search.py`:
- Around line 803-804: The current asyncio.gather call discards the fetched
referenced-document payloads (from
_fetch_solr_rag/_extract_solr_document_metadata) and later rebuilds documents
from context_chunks, losing OKP titles; modify the flow so the results returned
by byok_chunks_task and solr_chunks_task preserve referenced-document payloads
and their title metadata (either by: 1) having _fetch_solr_rag return the
referenced documents alongside chunks and filtering that referenced-doc list to
only the surviving chunk document_ids before returning, or 2) propagating
title/doc metadata into each chunk's attributes before
_convert_solr_chunks_to_rag_format runs), update places that call
_convert_solr_chunks_to_rag_format to accept/merge the extra metadata, and
mirror the behavior of parse_referenced_documents_from_responses_api to keep
frontend-useful referenced documents.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8311d4b1-2539-4c32-83dd-544949fcab43
📒 Files selected for processing (6)
src/models/config.pysrc/utils/vector_search.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_reranker_configuration.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-pr
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/integration/endpoints/test_query_byok_integration.pytests/unit/models/config/test_reranker_configuration.pytests/unit/utils/test_vector_search.pytests/unit/models/config/test_dump_configuration.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/models/config/test_reranker_configuration.pytests/unit/utils/test_vector_search.pytests/unit/models/config/test_dump_configuration.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/models/config.pysrc/utils/vector_search.py
src/models/config.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/config.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic config models
Files:
src/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Type hints for Pydantic models:Optional[FilePath],PositiveInt,SecretStr
Usetyping_extensions.Selffor model validators in Pydantic models
Pydantic config classes extendConfigurationBase, data models extendBaseModel
Use@model_validatorand@field_validatorfor validation in Pydantic models
Docstring Attributes section documents Pydantic model attributes
Files:
src/models/config.py
🧠 Learnings (10)
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
tests/integration/endpoints/test_query_byok_integration.pysrc/models/config.pytests/unit/utils/test_vector_search.pysrc/utils/vector_search.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
tests/unit/models/config/test_reranker_configuration.pysrc/models/config.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/models/config.py : Base class sets `extra="forbid"` to reject unknown fields in Pydantic config models
Applied to files:
tests/unit/models/config/test_reranker_configuration.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
Applied to files:
src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests
Applied to files:
tests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.
Applied to files:
src/utils/vector_search.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/runners/**/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`
Applied to files:
src/utils/vector_search.py
🔇 Additional comments (2)
tests/unit/models/config/test_reranker_configuration.py (1)
41-47: Assert the observable behavior, nothasattr(...).
PrivateAttr(default=False)makes_explicitly_configuredexist on every instance, sohasattr(config, "_explicitly_configured")will keep passing even if explicit-configuration detection breaks. Assert the actual value or a public effect instead.tests/unit/utils/test_vector_search.py (1)
728-1159: Strong reranker test coverage improvementGood additions across enable/disable wiring, cross-encoder caching, score normalization, failure fallbacks, and BYOK boost ordering/preservation. This meaningfully improves confidence in the new reranking path.
65f15b4 to
6641b22
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/models/config.py`:
- Around line 2138-2148: The auto-enable condition incorrectly triggers when
BYOK exists regardless of OKP or explicit user setting; update the if in the
block that references has_byok, has_inline_rag, has_okp and
self.reranker.enabled to require OKP plus either BYOK or inline RAG and also
ensure the reranker wasn’t explicitly configured (i.e., check not
self.reranker._explicitly_configured); specifically change the condition to
something like: if (has_okp and (has_byok or has_inline_rag) and not
self.reranker.enabled and not self.reranker._explicitly_configured) so the
reranker is only auto-enabled when OKP is present, a second source exists, and
the user didn’t explicitly set reranker.enabled.
In `@src/utils/vector_search.py`:
- Around line 700-701: The docstring for _fetch_solr_rag incorrectly documents a
non-existent parameter max_chunks (copied from _fetch_byok_rag); update the
docstring of _fetch_solr_rag to reflect its actual signature by removing the
max_chunks entry (or replacing it with the correct parameter name if intended),
ensuring any mention of constants.OKP_RAG_MAX_CHUNKS is removed or corrected,
and verify the parameter list in the docstring matches the function signature in
_fetch_solr_rag.
- Around line 768-770: The signature of build_rag_context incorrectly silences
an unused-argument warning for moderation_decision despite the function using
that parameter (it checks moderation_decision == "blocked"); remove the "#
pylint: disable=unused-argument" annotation from the build_rag_context
definition so linters reflect actual usage, and keep the existing conditional
that references moderation_decision (ensure no other code relies on the disable
comment); update any docstring or function comment if you want to document the
planned usage of moderation_decision.
- Line 17: Remove the top-level "from sentence_transformers import CrossEncoder"
import and instead perform a dynamic import inside the existing
_get_cross_encoder function: import sentence_transformers.CrossEncoder (or from
sentence_transformers import CrossEncoder) within _get_cross_encoder, keep the
module-level cached variable (the cache used by _get_cross_encoder) and
instantiate/cache the CrossEncoder there, and add a clear ImportError/exception
handling path to surface useful errors if the library is missing; ensure all
references use that cached instance and no other top-level references to
CrossEncoder remain.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ba65cf4d-67d0-4216-a736-eb3ca61da961
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
pyproject.tomlsrc/configuration.pysrc/constants.pysrc/models/config.pysrc/utils/vector_search.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_reranker_configuration.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: unit_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: mypy
- GitHub Check: integration_tests (3.13)
- GitHub Check: integration_tests (3.12)
- GitHub Check: Pylinter
- GitHub Check: build-pr
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (6)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax:str | int
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Useasync deffor I/O operations and external API calls
Uselogger.debug()for detailed diagnostic information
Uselogger.info()for general information about program execution
Uselogger.warning()for unexpected events or potential problems
Uselogger.error()for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes use ABC with@abstractmethoddecorators
Complete type annotations for all class attributes, use specific types, notAny
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection
Files:
src/configuration.pysrc/constants.pysrc/models/config.pysrc/utils/vector_search.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/unit/**/*.py: Use pytest for all unit and integration tests
Usepytest-mockfor AsyncMock objects in unit tests
Use markerpytest.mark.asynciofor async tests
Files:
tests/unit/models/config/test_reranker_configuration.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_vector_search.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Do not use unittest - pytest is the standard testing framework for this project
Files:
tests/unit/models/config/test_reranker_configuration.pytests/unit/models/config/test_dump_configuration.pytests/unit/utils/test_vector_search.pytests/integration/endpoints/test_query_byok_integration.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Central
constants.pyfor shared constants with descriptive comments
Files:
src/constants.py
src/models/config.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/config.py: All config uses Pydantic models extendingConfigurationBase
Base class setsextra="forbid"to reject unknown fields in Pydantic config models
Files:
src/models/config.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Type hints for Pydantic models:Optional[FilePath],PositiveInt,SecretStr
Usetyping_extensions.Selffor model validators in Pydantic models
Pydantic config classes extendConfigurationBase, data models extendBaseModel
Use@model_validatorand@field_validatorfor validation in Pydantic models
Docstring Attributes section documents Pydantic model attributes
Files:
src/models/config.py
🧠 Learnings (12)
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: ALWAYS check `pyproject.toml` for existing dependencies before adding new ones
Applied to files:
pyproject.toml
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`
Applied to files:
tests/unit/models/config/test_reranker_configuration.pysrc/models/config.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/models/config.py : Base class sets `extra="forbid"` to reject unknown fields in Pydantic config models
Applied to files:
tests/unit/models/config/test_reranker_configuration.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants
Applied to files:
src/constants.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
src/constants.pysrc/models/config.pytests/unit/utils/test_vector_search.pytests/integration/endpoints/test_query_byok_integration.pysrc/utils/vector_search.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
Applied to files:
src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests
Applied to files:
tests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.
Applied to files:
src/utils/vector_search.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/runners/**/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`
Applied to files:
src/utils/vector_search.py
🔇 Additional comments (9)
src/utils/vector_search.py (5)
35-53: InitialCrossEncoderload still blocks the event loop.The past review comment about offloading the initial model construction is still valid. While
model.predict()is correctly threaded at line 87,_get_cross_encoder(model_name)at line 79 is synchronous and performsCrossEncoder(model_name)which downloads/loads the model on first call — blocking the event loop.♻️ Proposed fix: make loader async with to_thread
-def _get_cross_encoder(model_name: str) -> Any: +async def _get_cross_encoder(model_name: str) -> Any: """Return the lazy-loaded cross-encoder model for reranking. ... """ if model_name not in _cross_encoder_models: try: - _cross_encoder_models[model_name] = CrossEncoder(model_name) + from sentence_transformers import CrossEncoder # pylint: disable=import-outside-toplevel + _cross_encoder_models[model_name] = await asyncio.to_thread(CrossEncoder, model_name) logger.info("Loaded cross-encoder for RAG reranking: %s", model_name) except Exception as e: # pylint: disable=broad-exception-caught ... return _cross_encoder_models[model_name]And at the call site (line 79):
- model = _get_cross_encoder(model_name) + model = await _get_cross_encoder(model_name)Also applies to: 79-79
246-252: Documents with neither title nor URL are still emitted.The past review comment remains valid:
_referenced_documents_from_rag_chunksappendsReferencedDocumentobjects even when bothattrs.get("title")andparsed_urlareNone. Per retrieved learnings, documents withoutdoc_titleordoc_urlare intentionally filtered elsewhere because the frontend cannot render them.♻️ Proposed fix
parsed_url: Optional[AnyUrl] = None if doc_url: try: parsed_url = AnyUrl(doc_url) except Exception: # pylint: disable=broad-exception-caught parsed_url = None + title = attrs.get("title") + if title is None and parsed_url is None: + continue result.append( ReferencedDocument( - doc_title=attrs.get("title"), + doc_title=title, doc_url=parsed_url, source=chunk.source, ) )Based on learnings: "In
src/app/endpoints/query_v2.py, theparse_referenced_documents_from_responses_apifunction filters referenced documents to include only those withdoc_titleordoc_urlbecause these documents are displayed in the frontend."
56-183: LGTM: Reranking implementation with score normalization and fallback.The cross-encoder reranking logic is well-structured:
- Proper empty-input guard (line 74-75)
model.predict()correctly offloaded viaasyncio.to_thread(line 87)- Handles numpy array outputs via
tolist()(lines 89-90)- Min-max normalization with edge-case handling (identical scores, single score)
- Combined 30%/70% weighting preserves original score influence
- Robust fallback on any exception (lines 172-182)
185-214: LGTM: BYOK boost implementation is correct.The boost logic correctly:
- Only applies boost to non-OKP chunks (line 200)
- Handles
Nonescores withfloat("-inf")fallback- Re-sorts by boosted scores
- Preserves all chunk attributes
816-832: LGTM: Reranker integration with configuration toggle.The conditional reranking is cleanly integrated:
- Respects
configuration.reranker.enabledflag- Passes configured model name
- Falls back to simple truncation when disabled (line 832)
- Applies BYOK boost only when reranking is enabled
tests/unit/utils/test_vector_search.py (4)
864-893:test_handles_import_errorduplicatestest_handles_model_loading_error.Both tests patch
CrossEncoderto raise the sameException("Model loading failed")and assertmodel is None. This provides duplicate coverage without actually testing the import-failure scenario the test name implies.To test an actual import error, you would need to simulate
sentence_transformersbeing unavailable. However, sinceCrossEncoderis now imported at module level, this is non-trivial and may not be worth the complexity. Consider either:
- Removing
test_handles_import_erroras redundant, or- Renaming it to clarify it tests the same instantiation-error path
♻️ Suggested simplification: remove duplicate test
- def test_handles_import_error(self, mocker: MockerFixture) -> None: - """Test graceful handling of sentence_transformers import error.""" - # Clear the cache for testing - # pylint: disable=import-outside-toplevel - from utils.vector_search import _cross_encoder_models - - _cross_encoder_models.clear() - - # Mock CrossEncoder to raise an exception when instantiated - mock_cross_encoder = mocker.patch("utils.vector_search.CrossEncoder") - mock_cross_encoder.side_effect = Exception("Model loading failed") - - model = _get_cross_encoder("test-model") - - assert model is None -
728-820: LGTM: Reranker configuration tests cover both enabled and disabled paths.The tests correctly verify:
_rerank_chunks_with_cross_encoderis called whenreranker.enabled = True(lines 728-777)- Correct
model_nameis passed from configuration- Function is not called when
reranker.enabled = False(lines 779-820)
896-1061: LGTM: Comprehensive cross-encoder reranking tests.Test coverage is thorough:
- Empty input handling
- Score normalization (identical scores, single chunk)
- Combined scoring formula verification with expected values
top_ktruncation- Fallback on model loading failure
- Fallback on prediction exception
- Numpy array conversion via
tolist()
1064-1158: LGTM: BYOK boost tests are well-designed.Tests correctly verify:
- Only non-OKP chunks receive boost
- Correct sorting by boosted scores
- Default boost factor usage
Nonescore handling- Attribute preservation
|
@Anxhela21 please fix unit tests. Not sure what causes the error in the e2e tests, I suspect the failure is not caught by the output debug logs |
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
b230117 to
92f1c8d
Compare
|
In e2e tests server mode group 3 this test is failing: PTAL, your changes might have to do with the failure. |
| # To be able to fix multiple CVEs, also LCORE-1117 | ||
| "requests>=2.33.0", | ||
| # Used for RAG chunk reranking (cross-encoder) | ||
| "sentence-transformers>=2.0.0", |
There was a problem hiding this comment.
I think this line is redundant and not needed since it is already defined in llslibdev below.
Description
Cross-Encoder Reranking for Enhanced RAG Quality
Summary
Implemented neural reranking using cross-encoder models to improve RAG chunk relevance scoring. This enhancement replaces simple vector similarity with semantic relevance scoring when combining results from multiple knowledge sources (BYOK + OKP) that have different search types e.g. hybrid, semantic, etc
Key Features
Flexible Configuration
reranker:
enabled: true
model: "cross-encoder/ms-marco-MiniLM-L6-v2"
top_k_multiplier: 2.0 # Fetch 2x chunks, rerank, keep top_k
byok_boost: 1.2 # Boost factor for BYOK sources
okp_boost: 1.0 # Boost factor for OKP sources
Auto-Enable
Error Handling
Technical Implementation
Modified Files
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Chores
Tests