Skip to content

LCORE-1723: Cross-Encoder Reranking for enhanced RAG#1566

Open
Anxhela21 wants to merge 13 commits intolightspeed-core:mainfrom
Anxhela21:anx/vector_compare
Open

LCORE-1723: Cross-Encoder Reranking for enhanced RAG#1566
Anxhela21 wants to merge 13 commits intolightspeed-core:mainfrom
Anxhela21:anx/vector_compare

Conversation

@Anxhela21
Copy link
Copy Markdown
Contributor

@Anxhela21 Anxhela21 commented Apr 21, 2026

Description

Cross-Encoder Reranking for Enhanced RAG Quality

Summary

Implemented neural reranking using cross-encoder models to improve RAG chunk relevance scoring. This enhancement replaces simple vector similarity with semantic relevance scoring when combining results from multiple knowledge sources (BYOK + OKP) that have different search types e.g. hybrid, semantic, etc

Key Features

  • Cross-encoder model integration using sentence-transformers
  • Configurable model selection (default: cross-encoder/ms-marco-MiniLM-L6-v2)
  • Query-document pair scoring for improved semantic relevance
  • Efficient model caching to avoid reloading between requests

Flexible Configuration

reranker:
enabled: true
model: "cross-encoder/ms-marco-MiniLM-L6-v2"
top_k_multiplier: 2.0 # Fetch 2x chunks, rerank, keep top_k
byok_boost: 1.2 # Boost factor for BYOK sources
okp_boost: 1.0 # Boost factor for OKP sources

Auto-Enable

  • Automatically enables when both BYOK and OKP RAG sources are configured
  • Only when user hasn't explicitly configured reranker settings (see above)

Error Handling

  • Fallback to original vector scores if reranking fails
  • Lazy model loading with error recovery
  • Model caching

Technical Implementation

Modified Files

  • src/utils/vector_search.py - Core reranking logic and pipeline integration
  • src/models/config.py - Configuration schema and validation
  • src/configuration.py - Configuration property access
  • src/constants.py - Default boost factors and settings
  • pyproject.toml - Added sentence-transformers dependency

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Optional cross-encoder RAG chunk reranking for improved relevance.
    • Reranker settings exposed in app configuration (enable/disable, model); auto-enabled when BYOK + inline RAG OKP are detected.
    • BYOK-specific rerank boost applied after reranking, with robust fallback to original ordering if the reranker is unavailable.
  • Chores

    • Adds runtime support for cross-encoder reranking.
  • Tests

    • New and expanded unit/integration tests covering reranking, boosting, caching, serialization, and fallbacks.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a configurable cross-encoder reranker for RAG: new RerankerConfiguration, AppConfig.reranker accessor, sentence-transformers runtime dependency, BYOK rerank boost constant, cross-encoder lazy loading and async reranking integrated into build_rag_context, plus related unit and integration test updates.

Changes

Cohort / File(s) Summary
Dependency
pyproject.toml
Adds runtime dependency sentence-transformers>=2.0.0.
Configuration models & API
src/models/config.py, src/configuration.py
Adds RerankerConfiguration model with enabled and model, _explicitly_configured marker, auto-enable validator that flips reranker on for inline BYOK+OKP, and exposes AppConfig.reranker.
Constants
src/constants.py
Adds BYOK_RAG_RERANK_BOOST: Final[float] = 1.2 for post-rerank BYOK score multiplication.
RAG pipeline / Vector search
src/utils/vector_search.py
Implements lazy cached CrossEncoder loading, async reranking of (query, chunk.content) pairs with min-max normalization and 30% CE / 70% original score mix, fallback behavior, BYOK-only boost and resorting, expanded candidate pool fetching (2x), merging/deduplication from final chunks, and Solr/BYOK query size limits.
Unit tests
tests/unit/models/config/test_reranker_configuration.py, tests/unit/utils/test_vector_search.py, tests/unit/models/config/test_dump_configuration.py
Adds tests for RerankerConfiguration behavior and schema, extensive reranker and BYOK-boost tests (loading, caching, scoring, normalization, fallbacks), and updates config dump expectations to include reranker.
Integration tests
tests/integration/endpoints/test_query_byok_integration.py
Disables reranker in BYOK max-chunks integration test and updates expected top chunk score to reflect reranker-disabled (unboosted) behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Builder as build_rag_context
    participant BYOK as _fetch_byok_chunks
    participant Solr as _fetch_solr_chunks
    participant Reranker as CrossEncoderReranker
    participant Booster as BYOKBooster
    participant Selector as SelectTopK
    participant Context as BuildRAGContext

    Client->>Builder: query + configuration
    Builder->>BYOK: fetch candidates (increased pool)
    BYOK-->>Builder: byok_chunks
    Builder->>Solr: fetch candidates (increased pool)
    Solr-->>Builder: solr_chunks
    Builder->>Builder: merge & dedupe candidates
    alt configuration.reranker.enabled
        Builder->>Reranker: score (query, chunk.content) pairs
        Reranker-->>Builder: normalized CE scores
        Builder->>Builder: combine CE (30%) + original (70%)
        Builder->>Booster: apply BYOK_RAG_RERANK_BOOST to non-OKP BYOK chunks
        Booster-->>Builder: boosted_scores
        Builder->>Selector: sort & select top K
    else
        Builder->>Selector: sort by original chunk.score & select top K
    end
    Selector-->>Context: final_chunks
    Context-->>Client: RAGContext (with referenced_documents derived from final_chunks)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'LCORE-1723: Cross-Encoder Reranking for enhanced RAG' is clear, concise, and directly summarizes the main feature added in this changeset—implementing neural reranking using cross-encoders to improve RAG chunk relevance.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Anxhela21 Anxhela21 changed the title cross encoder and reranker for byok and solr Cross-Encoder Reranking for Enhanced RAG Quality Apr 27, 2026
@Anxhela21 Anxhela21 force-pushed the anx/vector_compare branch from e6b6d66 to 880a566 Compare April 27, 2026 15:42
@Anxhela21 Anxhela21 marked this pull request as ready for review April 27, 2026 17:22
@Anxhela21 Anxhela21 changed the title Cross-Encoder Reranking for Enhanced RAG Quality [LCORE-1723] Cross-Encoder Reranking for Enhanced RAG Quality Apr 27, 2026
@Anxhela21 Anxhela21 changed the title [LCORE-1723] Cross-Encoder Reranking for Enhanced RAG Quality [LCORE-1723] Cross-Encoder Reranking Apr 27, 2026
@Anxhela21 Anxhela21 changed the title [LCORE-1723] Cross-Encoder Reranking [LCORE-1723] Cross-Encoder Reranking for enhanced RAG Apr 27, 2026
@Anxhela21 Anxhela21 force-pushed the anx/vector_compare branch from 0131555 to 82caaf6 Compare April 27, 2026 17:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 75-79: Remove the duplicate dependency entry for
"sentence-transformers>=2.0.0" in pyproject.toml (the repeated listing with the
same comment), leaving only a single declaration; also check the llslibdev group
requirement (">=5.0.0") and confirm whether the runtime floor should be raised
to ">=5.0.0" to match required features, updating the single remaining
"sentence-transformers" entry accordingly if the code needs 5.0.0+.

In `@src/constants.py`:
- Around line 195-196: Add a Final[float] type annotation to the new constant
BYOK_RAG_RERANK_BOOST so it matches the other constants' immutability checks;
update the declaration for BYOK_RAG_RERANK_BOOST to use Final[float] and ensure
Final is imported from typing at the top of the file if it isn't already.

In `@src/models/config.py`:
- Around line 1828-1866: The current PrivateAttr(_explicitly_configured: bool =
PrivateAttr(default=False)) makes _explicitly_configured always present and the
value-comparison in mark_as_explicitly_configured() misses cases where a user
explicitly provided a field equal to its default; change _explicitly_configured
to a tri-state (e.g. PrivateAttr(default=None)) and rewrite
mark_as_explicitly_configured to use the Pydantic-provided
self.model_fields_set: compute the set of reranker field names you care about
(enabled, model, model_id, provider_id, top_k_multiplier, byok_boost,
okp_boost), check if any of those names are in self.model_fields_set and if so
set self._explicitly_configured = True (leave it None/False otherwise), and
update the consuming code (the hasattr(...) check elsewhere) to test for
_explicitly_configured is not None / True instead of hasattr so the auto-enable
branch can run correctly.
- Around line 1815-1826: The three RerankerConfiguration fields
(top_k_multiplier, byok_boost, okp_boost) are declared but not used; update the
reranking code to read these values from the configuration instead of using
hard-coded constants: replace the hard-coded 2x in the pool_size calculation in
vector_search.py with configuration.reranker.top_k_multiplier when computing
pool_size (or its equivalent variable), change the boost applied in
_apply_byok_rerank_boost to use configuration.reranker.byok_boost instead of
constants.BYOK_RAG_RERANK_BOOST, and ensure OKP chunks receive
configuration.reranker.okp_boost (apply the multiplier where OKP chunks
currently pass through unmodified). Keep existing defaults from the
RerankerConfiguration when reading these fields so behavior is unchanged if the
user omits them.

In `@src/utils/vector_search.py`:
- Around line 28-56: The module-level cache _cross_encoder_models and loader
_get_cross_encoder are not concurrency-safe and can instantiate
CrossEncoder(model_name) multiple times on cold start; fix by serializing loads
for a given model_name (single-flight) — e.g., add a module-level dict of
asyncio.Lock objects (or a per-name asyncio.Event/Future) and acquire the lock
for model_name inside _get_cross_encoder (or convert the loader to a synchronous
function, apply functools.cache, and call it via asyncio.to_thread) so only one
task constructs CrossEncoder(model_name) and others await the result, then store
the loaded model (or None on failure) into _cross_encoder_models before
releasing the lock; ensure any exception handling still sets the cache to None
to avoid repeated retries.
- Around line 722-728: The function build_rag_context currently takes an unused
parameter moderation_decision (annotated with # pylint:
disable=unused-argument); either remove that parameter from the function
signature and delete the pylint disable, and update its three callers (the calls
in the functions at src/app/endpoints/query.py,
src/app/endpoints/streaming_query.py, and src/app/endpoints/responses.py) to
stop passing moderation_decision, or if you intend to use it later, keep the
parameter but replace the pylint disable with a short TODO comment in
build_rag_context explaining the planned future use and why moderation_decision
is intentionally ignored; reference the function name build_rag_context and the
three caller sites when making the change so the signature and call sites remain
consistent.
- Around line 59-134: The async function _rerank_chunks_with_cross_encoder
currently calls the blocking sentence_transformers.CrossEncoder.predict
synchronously (model.predict), which will block the asyncio event loop; change
the call to run in a thread via asyncio.to_thread (e.g., scores = await
asyncio.to_thread(model.predict, pairs)) so inference executes off the event
loop, preserving async responsiveness, and keep the subsequent score processing
logic unchanged; ensure to import asyncio if not already present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 89ed94e6-8009-41d5-8e4c-3f2892881c37

📥 Commits

Reviewing files that changed from the base of the PR and between 2d0c79a and 82caaf6.

📒 Files selected for processing (5)
  • pyproject.toml
  • src/configuration.py
  • src/constants.py
  • src/models/config.py
  • src/utils/vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: bandit
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • src/constants.py
  • src/configuration.py
  • src/models/config.py
  • src/utils/vector_search.py
**/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Central constants.py for shared constants with descriptive comments

Files:

  • src/constants.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/constants.py
  • src/configuration.py
  • src/models/config.py
  • src/utils/vector_search.py
src/**/config*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/config*.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic models
Use @field_validator and @model_validator for custom validation in Pydantic models
Use type hints like Optional[FilePath], PositiveInt, SecretStr in Pydantic models

Files:

  • src/configuration.py
  • src/models/config.py
🧠 Learnings (6)
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • src/constants.py
  • src/utils/vector_search.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/config*.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/*.py : Pydantic models extend `ConfigurationBase` for config, `BaseModel` for data models

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.

Applied to files:

  • src/utils/vector_search.py
🔇 Additional comments (3)
src/utils/vector_search.py (1)

722-746: pool_size/top_k ignore the configured top_k_multiplier.

pool_size = 2 * constants.BYOK_RAG_MAX_CHUNKS hard-codes the 2× factor, but RerankerConfiguration.top_k_multiplier (default 2.0) is intended to govern this exact value. Likewise, _apply_byok_rerank_boost(...) reads the multiplier from constants.BYOK_RAG_RERANK_BOOST, ignoring reranker.byok_boost and reranker.okp_boost. See the comment on src/models/config.py RerankerConfiguration for the full picture and a suggested wiring diff.

src/configuration.py (1)

469-475: LGTM.

The new reranker property follows the same pattern as the surrounding rag/okp properties. Since reranker is a non-optional field with a default_factory in Configuration, the access is safe once _configuration is loaded.

src/models/config.py (1)

1822-1823: Remove unused placeholder fields or wire them up.

model_id and provider_id in RerankerConfiguration are not consumed by src/utils/vector_search.py (only model and enabled are read). If these fields are reserved for future Llama Stack-backed reranker provider support, they should either be wired up or removed to avoid leaking unimplemented schema to users. The Pydantic v2 protected namespace concern does not apply here—the project uses Pydantic v2.10.6+, where the default protected_namespaces setting only flags direct conflicts with methods like model_validate and model_dump, not fields with the model_ prefix.

			> Likely an incorrect or invalid review comment.

Comment thread pyproject.toml Outdated
Comment thread src/constants.py Outdated
Comment thread src/models/config.py Outdated
Comment thread src/models/config.py
Comment thread src/utils/vector_search.py
Comment thread src/utils/vector_search.py
Comment thread src/utils/vector_search.py Outdated
@Anxhela21 Anxhela21 changed the title [LCORE-1723] Cross-Encoder Reranking for enhanced RAG LCORE-1723: Cross-Encoder Reranking for enhanced RAG Apr 27, 2026
@Anxhela21 Anxhela21 force-pushed the anx/vector_compare branch from cce6b82 to d664680 Compare April 27, 2026 18:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/vector_search.py (1)

642-685: ⚠️ Potential issue | 🟠 Major

The OKP fetch path never actually expands the rerank candidate pool.

_fetch_solr_rag() accepts max_chunks, but the value never reaches vector_io.query() because _build_query_params(solr) still uses the default k. build_rag_context() also calls _fetch_solr_rag() without max_chunks=pool_size. As a result, OKP stays capped at the default retrieval size, so the reranker never sees the larger Solr pool described in this PR.

♻️ Suggested fix
-            params = _build_query_params(solr)
+            params = _build_query_params(solr, k=limit)
-    solr_chunks_task = _fetch_solr_rag(client, query, solr)
+    solr_chunks_task = _fetch_solr_rag(
+        client, query, solr, max_chunks=pool_size
+    )

Also applies to: 748-755

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 642 - 685, The Solr fetch never
expands the rerank candidate pool because _fetch_solr_rag ignores its max_chunks
arg and _build_query_params still uses the default k; also build_rag_context
doesn't pass pool_size into _fetch_solr_rag. Fix by threading max_chunks
through: update _fetch_solr_rag to pass max_chunks (or computed limit) into the
params returned by _build_query_params (ensure _build_query_params accepts a
k/limit argument and sets params['k'] accordingly), call client.vector_io.query
with those params, and modify build_rag_context to call _fetch_solr_rag(...,
max_chunks=pool_size) so the Solr query and vector_io.query receive the larger
pool_size for reranking (refer to functions _fetch_solr_rag,
_build_query_params, build_rag_context, vector_io.query, and variables
max_chunks/pool_size).
♻️ Duplicate comments (3)
src/utils/vector_search.py (2)

79-90: ⚠️ Potential issue | 🔴 Critical

Offload CrossEncoder.predict() off the event loop.

_rerank_chunks_with_cross_encoder() is async, but model.predict() is a synchronous inference call. Awaiting this path under load will stall the event loop for the full rerank duration and block unrelated requests.

♻️ Minimal fix
-        model = _get_cross_encoder(model_name)
+        model = await _get_cross_encoder(model_name)
@@
-        scores = model.predict(pairs)
+        scores = await asyncio.to_thread(model.predict, pairs)
According to the official sentence-transformers documentation, is `CrossEncoder.predict(...)` a synchronous/blocking call, and is `asyncio.to_thread(...)` the appropriate way to invoke it from Python async code?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 79 - 90,
_rerank_chunks_with_cross_encoder is async but calls the synchronous
CrossEncoder.predict (via model.predict) which blocks the event loop; change the
call to run the blocking inference off the loop using asyncio.to_thread (or
loop.run_in_executor) so prediction runs in a worker thread and returns
awaitably. Locate the call to model.predict in _rerank_chunks_with_cross_encoder
and replace the direct call with an awaited asyncio.to_thread(lambda:
model.predict(pairs)) (or equivalent) so the CrossEncoder.predict invocation is
executed off the event loop.

34-56: ⚠️ Potential issue | 🟠 Major

Serialize cold-start loads for each cross-encoder model.

This cache is still racy: concurrent requests can all miss _cross_encoder_models and instantiate the same model before the first load stores it, which is an expensive CPU/RAM spike for large rerankers. Wrap model construction in a per-model single-flight lock so only one task loads each model name.

♻️ Suggested single-flight pattern
 _cross_encoder_models: dict[str, Any] = {}  # pylint: disable=invalid-name
+_cross_encoder_load_locks: dict[str, asyncio.Lock] = {}  # pylint: disable=invalid-name
 
 
-def _get_cross_encoder(model_name: str) -> Any:
+async def _get_cross_encoder(model_name: str) -> Any:
     """Return the lazy-loaded cross-encoder model for reranking.
@@
-    if model_name not in _cross_encoder_models:
-        try:
-            from sentence_transformers import (  # pylint: disable=import-outside-toplevel
-                CrossEncoder,
-            )
-
-            _cross_encoder_models[model_name] = CrossEncoder(model_name)
-            logger.info("Loaded cross-encoder for RAG reranking: %s", model_name)
-        except Exception as e:  # pylint: disable=broad-exception-caught
-            logger.warning(
-                "Could not load cross-encoder for reranking (%s): %s", model_name, e
-            )
-            _cross_encoder_models[model_name] = None
+    if model_name in _cross_encoder_models:
+        return _cross_encoder_models[model_name]
+
+    lock = _cross_encoder_load_locks.setdefault(model_name, asyncio.Lock())
+    async with lock:
+        if model_name in _cross_encoder_models:
+            return _cross_encoder_models[model_name]
+        try:
+            from sentence_transformers import (  # pylint: disable=import-outside-toplevel
+                CrossEncoder,
+            )
+
+            _cross_encoder_models[model_name] = await asyncio.to_thread(
+                CrossEncoder, model_name
+            )
+            logger.info("Loaded cross-encoder for RAG reranking: %s", model_name)
+        except Exception as e:  # pylint: disable=broad-exception-caught
+            logger.warning(
+                "Could not load cross-encoder for reranking (%s): %s", model_name, e
+            )
+            _cross_encoder_models[model_name] = None
     return _cross_encoder_models[model_name]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 34 - 56, The _get_cross_encoder
function is racy because multiple callers can simultaneously miss
_cross_encoder_models and each instantiate the same heavy model; fix it by
introducing a per-model single-flight lock (e.g., a module-level dict
_cross_encoder_locks: Dict[str, threading.Lock]) and wrap the check-and-create
logic for a given model_name inside that lock: obtain or create the lock for
model_name (use setdefault), acquire it (with context manager) before
re-checking _cross_encoder_models[model_name], then instantiate CrossEncoder and
assign to _cross_encoder_models[model_name] inside the locked region (ensure
exceptions still set the cache to None and release the lock in finally). Update
references to _get_cross_encoder, _cross_encoder_models, and logger accordingly.
src/models/config.py (1)

1829-1861: ⚠️ Potential issue | 🟠 Major

_explicitly_configured never makes the auto-enable branch work as intended.

PrivateAttr(default=False) means _explicitly_configured is always present, so not hasattr(self.reranker, "_explicitly_configured") is always false and the auto-enable path never runs. The value-comparison validator also misses explicit input that matches defaults, e.g. reranker: { enabled: true }. Use model_fields_set to mark explicit input and test the flag value directly.

♻️ Suggested fix
 class RerankerConfiguration(ConfigurationBase):
@@
     _explicitly_configured: bool = PrivateAttr(default=False)
@@
     `@model_validator`(mode="after")
     def mark_as_explicitly_configured(self) -> Self:
         """Mark this configuration as explicitly set when instantiated from user input."""
-        # Only mark as explicitly configured if we're not using all default values
-        # This allows auto-enabling when user hasn't touched reranker settings
-        # Check if any field differs from default values
-        default_model = "cross-encoder/ms-marco-MiniLM-L6-v2"
-        default_top_k_multiplier = 2.0
-        default_byok_boost = 1.2
-        default_okp_boost = 1.0
-
-        # Check if any setting differs from defaults (indicates explicit configuration)
-        current_values = [
-            self.enabled,
-            self.model,
-            self.top_k_multiplier,
-            self.byok_boost,
-            self.okp_boost,
-        ]
-        default_values = [
-            True,
-            default_model,
-            default_top_k_multiplier,
-            default_byok_boost,
-            default_okp_boost,
-        ]
-
-        if current_values != default_values:
+        if self.model_fields_set:
             self._explicitly_configured = True
         return self
@@
-        if (
-            has_byok
-            and has_okp
-            and not hasattr(self.reranker, "_explicitly_configured")
-        ):
+        if has_byok and has_okp and not self.reranker._explicitly_configured:
             # pylint: disable=no-member
             if not self.reranker.enabled:
                 logger.info(
                     "Automatically enabling reranker: Both BYOK RAG (%d entries) and OKP "
                     "are configured. Reranking improves result quality when multiple "
In Pydantic v2, do `PrivateAttr(default=...)` attributes always exist on model instances, and does `model_fields_set` track fields explicitly provided in the input even when the provided value equals the field default?

Also applies to: 2154-2168

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/config.py` around lines 1829 - 1861, The _explicitly_configured
flag is never meaningful because PrivateAttr always exists; update the
mark_as_explicitly_configured validator (and the similar validator at lines
~2154-2168) to use model_fields_set to detect which fields were explicitly
provided and set self._explicitly_configured = True when any reranker-related
fields (e.g., "enabled", "model", "top_k_multiplier", "byok_boost", "okp_boost")
appear in self.model_fields_set or when the parent model passed reranker in its
input; also change any checks that use hasattr(self.reranker,
"_explicitly_configured") to inspect the flag's value directly
(self.reranker._explicitly_configured) so the auto-enable branch can run when
the user explicitly provided values even if they match defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/vector_search.py`:
- Around line 182-205: The loop that builds ReferencedDocument from rag_chunks
currently appends entries even when both title and URL are missing; update the
logic in the loop (the section that computes attrs, doc_url, parsed_url and
before result.append) to skip/continue when both attrs.get("title") is falsy and
parsed_url is falsy/None so you only append ReferencedDocument when there is a
doc_title or a doc_url; keep existing dedup_key/seen handling and AnyUrl parsing
but add a guard (using the computed title and parsed_url) to avoid creating
empty ReferencedDocument objects.

---

Outside diff comments:
In `@src/utils/vector_search.py`:
- Around line 642-685: The Solr fetch never expands the rerank candidate pool
because _fetch_solr_rag ignores its max_chunks arg and _build_query_params still
uses the default k; also build_rag_context doesn't pass pool_size into
_fetch_solr_rag. Fix by threading max_chunks through: update _fetch_solr_rag to
pass max_chunks (or computed limit) into the params returned by
_build_query_params (ensure _build_query_params accepts a k/limit argument and
sets params['k'] accordingly), call client.vector_io.query with those params,
and modify build_rag_context to call _fetch_solr_rag(..., max_chunks=pool_size)
so the Solr query and vector_io.query receive the larger pool_size for reranking
(refer to functions _fetch_solr_rag, _build_query_params, build_rag_context,
vector_io.query, and variables max_chunks/pool_size).

---

Duplicate comments:
In `@src/models/config.py`:
- Around line 1829-1861: The _explicitly_configured flag is never meaningful
because PrivateAttr always exists; update the mark_as_explicitly_configured
validator (and the similar validator at lines ~2154-2168) to use
model_fields_set to detect which fields were explicitly provided and set
self._explicitly_configured = True when any reranker-related fields (e.g.,
"enabled", "model", "top_k_multiplier", "byok_boost", "okp_boost") appear in
self.model_fields_set or when the parent model passed reranker in its input;
also change any checks that use hasattr(self.reranker, "_explicitly_configured")
to inspect the flag's value directly (self.reranker._explicitly_configured) so
the auto-enable branch can run when the user explicitly provided values even if
they match defaults.

In `@src/utils/vector_search.py`:
- Around line 79-90: _rerank_chunks_with_cross_encoder is async but calls the
synchronous CrossEncoder.predict (via model.predict) which blocks the event
loop; change the call to run the blocking inference off the loop using
asyncio.to_thread (or loop.run_in_executor) so prediction runs in a worker
thread and returns awaitably. Locate the call to model.predict in
_rerank_chunks_with_cross_encoder and replace the direct call with an awaited
asyncio.to_thread(lambda: model.predict(pairs)) (or equivalent) so the
CrossEncoder.predict invocation is executed off the event loop.
- Around line 34-56: The _get_cross_encoder function is racy because multiple
callers can simultaneously miss _cross_encoder_models and each instantiate the
same heavy model; fix it by introducing a per-model single-flight lock (e.g., a
module-level dict _cross_encoder_locks: Dict[str, threading.Lock]) and wrap the
check-and-create logic for a given model_name inside that lock: obtain or create
the lock for model_name (use setdefault), acquire it (with context manager)
before re-checking _cross_encoder_models[model_name], then instantiate
CrossEncoder and assign to _cross_encoder_models[model_name] inside the locked
region (ensure exceptions still set the cache to None and release the lock in
finally). Update references to _get_cross_encoder, _cross_encoder_models, and
logger accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 32b1995a-8892-4d14-a537-53427a6d3620

📥 Commits

Reviewing files that changed from the base of the PR and between 82caaf6 and d664680.

📒 Files selected for processing (7)
  • pyproject.toml
  • src/configuration.py
  • src/constants.py
  • src/models/config.py
  • src/utils/vector_search.py
  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: Pylinter
  • GitHub Check: mypy
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Import FastAPI dependencies with: from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Use typing_extensions.Self for model validators in Pydantic models
Use modern union type syntax str | int instead of Union[str, int]
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Use @model_validator and @field_validator for Pydantic model validation
Complete type annotations for all class attributes; use specific types, not Any
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections

Files:

  • src/constants.py
  • src/configuration.py
  • tests/unit/models/config/test_reranker_configuration.py
  • src/models/config.py
  • tests/unit/utils/test_vector_search.py
  • src/utils/vector_search.py
**/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Central constants.py for shared constants with descriptive comments

Files:

  • src/constants.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models extend ConfigurationBase for config, BaseModel for data models

Files:

  • src/constants.py
  • src/configuration.py
  • src/models/config.py
  • src/utils/vector_search.py
src/**/config*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/config*.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic models
Use @field_validator and @model_validator for custom validation in Pydantic models
Use type hints like Optional[FilePath], PositiveInt, SecretStr in Pydantic models

Files:

  • src/configuration.py
  • src/models/config.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Use pytest-mock for AsyncMock objects in tests
Use marker pytest.mark.asyncio for async tests
Unit tests require 60% coverage, integration tests 10%

Files:

  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/utils/test_vector_search.py
🧠 Learnings (9)
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/*.py : Use Final[type] as type hint for all constants

Applied to files:

  • src/constants.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to **/constants.py : Central `constants.py` for shared constants with descriptive comments

Applied to files:

  • src/constants.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • src/constants.py
  • src/models/config.py
  • tests/unit/utils/test_vector_search.py
  • src/utils/vector_search.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Always check `pyproject.toml` for existing dependencies before adding new ones

Applied to files:

  • pyproject.toml
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/config*.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • tests/unit/models/config/test_reranker_configuration.py
  • src/models/config.py
📚 Learning: 2026-04-19T15:40:25.624Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-19T15:40:25.624Z
Learning: Applies to src/**/*.py : Pydantic models extend `ConfigurationBase` for config, `BaseModel` for data models

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.

Applied to files:

  • src/utils/vector_search.py

Comment thread src/utils/vector_search.py Outdated
Copy link
Copy Markdown
Contributor

@are-ces are-ces left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example config in the PR description must be outdated because

top_k_multiplier: 2.0 # Fetch 2x chunks, rerank, keep top_k
byok_boost: 1.2 # Boost factor for BYOK sources
okp_boost: 1.0 # Boost factor for OKP sources

are missing from the configuration in the PR.

Would be nice to see a configured reranker in the example configuration examples/lightspeed-stack-byok-okp-rag.yaml.

Please update unit and integration tests; fix Black and Linting failing tests. TY!

Comment thread src/utils/vector_search.py Outdated
Comment thread src/utils/vector_search.py Outdated
Comment thread src/models/config.py Outdated
Comment thread src/models/config.py
Comment thread src/utils/vector_search.py
Comment thread src/utils/vector_search.py Outdated
Comment thread src/utils/vector_search.py
Comment thread src/utils/vector_search.py Outdated
Comment thread pyproject.toml Outdated
# To be able to fix multiple CVEs, also LCORE-1117
"requests>=2.33.0",
# Used for RAG chunk reranking (cross-encoder)
"sentence-transformers>=2.0.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the sentence-transformer dependency be added @syedriko @tisnik? In that case uv.lock will need to be updated too

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it needs to be updated to pass CI etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks missed that - will update uv.lock

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is redundant and not needed since it is already defined in llslibdev below.

Comment thread src/models/config.py Outdated
Comment thread pyproject.toml Outdated
# To be able to fix multiple CVEs, also LCORE-1117
"requests>=2.33.0",
# Used for RAG chunk reranking (cross-encoder)
"sentence-transformers>=2.0.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it needs to be updated to pass CI etc.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (6)
src/utils/vector_search.py (5)

87-89: ⚠️ Potential issue | 🟠 Major

Run predict() in a worker thread.

CrossEncoder.predict() is synchronous. Calling it directly inside async def _rerank_chunks_with_cross_encoder blocks the event loop for the full inference time.

♻️ Proposed fix
         # Create query-chunk pairs for scoring
         pairs = [(query, chunk.content) for chunk in chunks]
-        scores = model.predict(pairs)
+        scores = await asyncio.to_thread(model.predict, pairs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 87 - 89, The call to the synchronous
CrossEncoder.predict inside async def _rerank_chunks_with_cross_encoder blocks
the event loop; run model.predict in a background thread using asyncio's
loop.run_in_executor (or asyncio.to_thread) instead. Locate
_rerank_chunks_with_cross_encoder and replace the direct scores =
model.predict(pairs) call with an awaitable that executes model.predict in a
worker thread, preserving the same input pairs and returning scores for the rest
of the function to use. Ensure exceptions from model.predict propagate or are
handled appropriately when run in the executor.

701-722: ⚠️ Potential issue | 🟠 Major

Pass max_chunks through to the Solr query.

limit is computed here, but vector_io.query still uses _build_query_params(solr) without k=limit, so OKP retrieval remains capped at SOLR_VECTOR_SEARCH_DEFAULT_K. The expanded rerank pool is therefore applied only to BYOK.

♻️ Proposed fix
-            params = _build_query_params(solr)
+            params = _build_query_params(solr, k=limit)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 701 - 722, The code computes limit
but doesn't pass it to the Solr query; update the call site in this block so the
Solr query uses that limit by passing it into the query params (e.g., call
_build_query_params(solr, k=limit) or set params["k"]=limit after params =
_build_query_params(solr)), then pass the updated params into
client.vector_io.query (the vector_io.query invocation and the
_build_query_params function are the unique symbols to modify).

759-793: ⚠️ Potential issue | 🔴 Critical

Keep the moderation gate before querying RAG backends.

moderation_decision is still passed in, but this function immediately starts BYOK/OKP retrieval for every request. That sends blocked prompts to the vector stores again, which regresses the earlier safety boundary.

♻️ Proposed fix
 async def build_rag_context(  # pylint: disable=too-many-locals
     client: AsyncLlamaStackClient,
     moderation_decision: str,  # pylint: disable=unused-argument
     query: str,
     vector_store_ids: Optional[list[str]],
     solr: Optional[SolrVectorSearchRequest] = None,
 ) -> RAGContext:
@@
+    if moderation_decision == "blocked":
+        logger.info("Skipping inline RAG because moderation blocked the query")
+        return RAGContext(
+            context_text="",
+            rag_chunks=[],
+            referenced_documents=[],
+        )
+
     pool_size = 2 * constants.BYOK_RAG_MAX_CHUNKS
     top_k = constants.BYOK_RAG_MAX_CHUNKS
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 759 - 793, The function
build_rag_context currently starts _fetch_byok_rag and _fetch_solr_rag before
honoring the moderation_decision; change it to check moderation_decision at the
top of build_rag_context and short-circuit when the decision indicates a blocked
request (e.g., "BLOCK"/unsafe), returning an empty RAGContext (no chunks, empty
context text and documents) instead of invoking _fetch_byok_rag/_fetch_solr_rag;
keep the rest of the logic unchanged for non-blocked decisions.

34-56: ⚠️ Potential issue | 🟠 Major

Load the cross-encoder off the event loop and make cache misses single-flight.

CrossEncoder(model_name) can block on model load/download, and concurrent cold starts can instantiate the same model multiple times before _cross_encoder_models is populated.

♻️ Proposed fix
 _cross_encoder_models: dict[str, Any] = {}  # pylint: disable=invalid-name
+_cross_encoder_locks: dict[str, asyncio.Lock] = {}  # pylint: disable=invalid-name
 
 
-def _get_cross_encoder(model_name: str) -> Any:
+async def _get_cross_encoder(model_name: str) -> Any:
     """Return the lazy-loaded cross-encoder model for reranking.
@@
-    if model_name not in _cross_encoder_models:
-        try:
+    if model_name in _cross_encoder_models:
+        return _cross_encoder_models[model_name]
+
+    lock = _cross_encoder_locks.setdefault(model_name, asyncio.Lock())
+    async with lock:
+        if model_name in _cross_encoder_models:
+            return _cross_encoder_models[model_name]
+        try:
             from sentence_transformers import (  # pylint: disable=import-outside-toplevel
                 CrossEncoder,
             )
 
-            _cross_encoder_models[model_name] = CrossEncoder(model_name)
+            _cross_encoder_models[model_name] = await asyncio.to_thread(
+                CrossEncoder, model_name
+            )
             logger.info("Loaded cross-encoder for RAG reranking: %s", model_name)
         except Exception as e:  # pylint: disable=broad-exception-caught
             logger.warning(
                 "Could not load cross-encoder for reranking (%s): %s", model_name, e
             )
             _cross_encoder_models[model_name] = None
     return _cross_encoder_models[model_name]

Line 81 would then need to await _get_cross_encoder(model_name).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 34 - 56, _get_cross_encoder
currently constructs CrossEncoder(model_name) on the event loop and allows
concurrent cache-miss races; change it to an async function that performs
blocking CrossEncoder(model_name) inside an executor (run_in_executor) and
implement single-flight cache-miss protection (e.g., a per-model asyncio.Lock or
a dict of asyncio.Future/promises keyed by model_name) so only one loader runs
per model and others await the result; populate _cross_encoder_models with the
loaded model (or None on failure) and ensure callers (e.g., the caller at the
noted site that must now `await _get_cross_encoder(model_name)`) are updated to
await the async function.

219-241: ⚠️ Potential issue | 🟠 Major

Skip referenced documents that have neither title nor URL.

This helper currently emits ReferencedDocument entries for doc-id-only chunks. Those are intentionally filtered out elsewhere because the frontend cannot render them usefully.

♻️ Proposed fix
     for chunk in rag_chunks:
         attrs = chunk.attributes or {}
         doc_url = (
             attrs.get("reference_url") or attrs.get("doc_url") or attrs.get("docs_url")
         )
+        title = attrs.get("title")
         doc_id = attrs.get("document_id") or attrs.get("doc_id")
         dedup_key = doc_url or doc_id or chunk.source or ""
         if not dedup_key or dedup_key in seen:
             continue
         seen.add(dedup_key)
@@
-        result.append(
+        if not title and parsed_url is None:
+            continue
+        result.append(
             ReferencedDocument(
-                doc_title=attrs.get("title"),
+                doc_title=title,
                 doc_url=parsed_url,
                 source=chunk.source,
             )
         )

Based on learnings: parse_referenced_documents_from_responses_api keeps only documents with doc_title or doc_url because these are displayed in the frontend.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 219 - 241, The loop over rag_chunks
currently always appends a ReferencedDocument even when a chunk has neither a
title nor a URL; update the logic in the function (the loop that iterates
rag_chunks and constructs ReferencedDocument) to skip adding entries unless
attrs.get("title") or the parsed doc_url is present (i.e., only append
ReferencedDocument when at least one of doc_title or doc_url exists), keeping
the existing dedup_key/seen checks and AnyUrl parsing behavior; this mirrors the
behavior in parse_referenced_documents_from_responses_api so the frontend won’t
receive doc-id-only entries.
src/models/config.py (1)

1830-1839: ⚠️ Potential issue | 🟠 Major

Respect explicit reranker settings in the auto-enable guard.

mark_as_explicitly_configured() records whether the user touched reranker, but this validator never checks that flag. An explicit reranker.enabled: false still gets forced back to true. Also, the current or/and mix is parsed as has_byok or (...), so any non-empty byok_rag trips this block before the OKP/enabled checks apply.

♻️ Proposed fix
-        if (
-            has_byok or has_inline_rag
-            and has_okp
-            and not self.reranker.enabled
-        ):
+        if (
+            (has_byok or has_inline_rag)
+            and has_okp
+            and not self.reranker._explicitly_configured  # pylint: disable=protected-access
+            and not self.reranker.enabled
+        ):

Also applies to: 2135-2147

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/config.py` around lines 1830 - 1839, The auto-enable guard
currently ignores the _explicitly_configured flag set by
mark_as_explicitly_configured() and has incorrect operator grouping so byok_rag
truthiness bypasses OKP/enabled checks; update the guard that decides
reranker.enabled to first respect _explicitly_configured (do not override when
true) and then fix the boolean expression to use parentheses so it evaluates as
has_byok or (okp_present and enabled_check) rather than has_byok or (...)—apply
the same fix to the other identical guard scope referenced later (the second
block around the reranker auto-enable code).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 75-76: The dependency "sentence-transformers>=2.0.0" pulls a
transitive "torch>=1.11.0" at runtime, but the existing [tool.uv.sources]
override is scoped to group = "llslibdev" so runtime installs still resolve
torch from default PyPI (GPU variant); move the pytorch-cpu index override out
of the group-scoped [tool.uv.sources] and apply it globally (top-level
[tool.uv.sources]) so that package resolution for "torch" (and thus for the
transitive dependency from sentence-transformers) uses the pytorch-cpu index for
all installs.

In `@src/models/config.py`:
- Around line 2124-2125: The check "has_inline_rag = len(self.rag) > 0" will
raise TypeError because RagConfiguration (self.rag) has no __len__; replace this
by checking the inline field directly — e.g., set has_inline_rag =
bool(self.rag.inline) — so update the logic in the method/initializer where
has_inline_rag is computed (referencing self.rag and has_inline_rag) to use
bool(self.rag.inline) instead of len(self.rag).

In `@src/utils/vector_search.py`:
- Around line 818-821: The else branch currently calls
_apply_byok_rerank_boost(merged) which mutates scores and ordering even when the
reranker is disabled; change this to preserve original vector ranking and scores
by taking the top_k items directly from merged (e.g., context_chunks =
merged[:top_k]) so no scores or ordering are modified when reranker.enabled =
False; reference: _apply_byok_rerank_boost, merged, context_chunks, top_k,
reranker.enabled.

In `@tests/unit/models/config/test_reranker_configuration.py`:
- Around line 46-53: The test should assert a public observable effect instead
of the private _explicitly_configured sentinel: replace the hasattr(config,
"_explicitly_configured") check in test_explicit_configuration_detection with an
assertion that the instance's public value differs from the default (e.g.,
create a default = RerankerConfiguration() and assert config.enabled !=
default.enabled or assert config.enabled is False and default.enabled is True),
so the test verifies that providing a non-default value to RerankerConfiguration
produces the expected public behavior.

---

Duplicate comments:
In `@src/models/config.py`:
- Around line 1830-1839: The auto-enable guard currently ignores the
_explicitly_configured flag set by mark_as_explicitly_configured() and has
incorrect operator grouping so byok_rag truthiness bypasses OKP/enabled checks;
update the guard that decides reranker.enabled to first respect
_explicitly_configured (do not override when true) and then fix the boolean
expression to use parentheses so it evaluates as has_byok or (okp_present and
enabled_check) rather than has_byok or (...)—apply the same fix to the other
identical guard scope referenced later (the second block around the reranker
auto-enable code).

In `@src/utils/vector_search.py`:
- Around line 87-89: The call to the synchronous CrossEncoder.predict inside
async def _rerank_chunks_with_cross_encoder blocks the event loop; run
model.predict in a background thread using asyncio's loop.run_in_executor (or
asyncio.to_thread) instead. Locate _rerank_chunks_with_cross_encoder and replace
the direct scores = model.predict(pairs) call with an awaitable that executes
model.predict in a worker thread, preserving the same input pairs and returning
scores for the rest of the function to use. Ensure exceptions from model.predict
propagate or are handled appropriately when run in the executor.
- Around line 701-722: The code computes limit but doesn't pass it to the Solr
query; update the call site in this block so the Solr query uses that limit by
passing it into the query params (e.g., call _build_query_params(solr, k=limit)
or set params["k"]=limit after params = _build_query_params(solr)), then pass
the updated params into client.vector_io.query (the vector_io.query invocation
and the _build_query_params function are the unique symbols to modify).
- Around line 759-793: The function build_rag_context currently starts
_fetch_byok_rag and _fetch_solr_rag before honoring the moderation_decision;
change it to check moderation_decision at the top of build_rag_context and
short-circuit when the decision indicates a blocked request (e.g.,
"BLOCK"/unsafe), returning an empty RAGContext (no chunks, empty context text
and documents) instead of invoking _fetch_byok_rag/_fetch_solr_rag; keep the
rest of the logic unchanged for non-blocked decisions.
- Around line 34-56: _get_cross_encoder currently constructs
CrossEncoder(model_name) on the event loop and allows concurrent cache-miss
races; change it to an async function that performs blocking
CrossEncoder(model_name) inside an executor (run_in_executor) and implement
single-flight cache-miss protection (e.g., a per-model asyncio.Lock or a dict of
asyncio.Future/promises keyed by model_name) so only one loader runs per model
and others await the result; populate _cross_encoder_models with the loaded
model (or None on failure) and ensure callers (e.g., the caller at the noted
site that must now `await _get_cross_encoder(model_name)`) are updated to await
the async function.
- Around line 219-241: The loop over rag_chunks currently always appends a
ReferencedDocument even when a chunk has neither a title nor a URL; update the
logic in the function (the loop that iterates rag_chunks and constructs
ReferencedDocument) to skip adding entries unless attrs.get("title") or the
parsed doc_url is present (i.e., only append ReferencedDocument when at least
one of doc_title or doc_url exists), keeping the existing dedup_key/seen checks
and AnyUrl parsing behavior; this mirrors the behavior in
parse_referenced_documents_from_responses_api so the frontend won’t receive
doc-id-only entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: abc3be5b-4824-4542-9263-2d92c2a3f592

📥 Commits

Reviewing files that changed from the base of the PR and between d664680 and 5a81aae.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • pyproject.toml
  • src/constants.py
  • src/models/config.py
  • src/utils/vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build-pr
  • GitHub Check: mypy
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
🧰 Additional context used
📓 Path-based instructions (6)
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/utils/test_vector_search.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/unit/models/config/test_reranker_configuration.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/utils/test_vector_search.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/constants.py
  • src/models/config.py
  • src/utils/vector_search.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Central constants.py for shared constants with descriptive comments

Files:

  • src/constants.py
src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic config models

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Type hints for Pydantic models: Optional[FilePath], PositiveInt, SecretStr
Use typing_extensions.Self for model validators in Pydantic models
Pydantic config classes extend ConfigurationBase, data models extend BaseModel
Use @model_validator and @field_validator for validation in Pydantic models
Docstring Attributes section documents Pydantic model attributes

Files:

  • src/models/config.py
🧠 Learnings (9)
📚 Learning: 2026-04-29T15:45:22.820Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.820Z
Learning: ALWAYS check `pyproject.toml` for existing dependencies before adding new ones

Applied to files:

  • pyproject.toml
📚 Learning: 2026-04-29T15:45:22.820Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.820Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • tests/unit/models/config/test_reranker_configuration.py
  • src/models/config.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • tests/integration/endpoints/test_query_byok_integration.py
  • src/constants.py
  • src/models/config.py
  • tests/unit/utils/test_vector_search.py
  • src/utils/vector_search.py
📚 Learning: 2026-04-29T15:45:22.820Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.820Z
Learning: Applies to src/constants.py : Central `constants.py` for shared constants with descriptive comments

Applied to files:

  • src/constants.py
📚 Learning: 2026-04-29T15:45:22.820Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.820Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants

Applied to files:

  • src/constants.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.

Applied to files:

  • src/utils/vector_search.py
🔇 Additional comments (1)
src/models/config.py (1)

2000-2004: Good use of default_factory for the nested config.

This matches the existing Pydantic pattern here and avoids sharing a mutable reranker instance across Configuration objects.

Comment thread pyproject.toml Outdated
Comment thread src/models/config.py Outdated
Comment thread src/utils/vector_search.py Outdated
Comment thread tests/unit/models/config/test_reranker_configuration.py
Comment thread src/models/config.py Outdated
Comment thread src/models/config.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/utils/vector_search.py (1)

803-804: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t discard source-specific referenced-document metadata.

_fetch_solr_rag() already extracts referenced documents from raw OKP chunks, including titles via _extract_solr_document_metadata(), but this gather drops them and later rebuilds documents from context_chunks. _convert_solr_chunks_to_rag_format() only carries document_id/doc_url, so the reconstructed list loses OKP titles after reranking. Keep the fetched document payloads and filter them to the surviving chunks, or propagate title into the chunk attributes before rebuilding.

Based on learnings: parse_referenced_documents_from_responses_api keeps frontend-useful referenced documents because these documents are displayed in the frontend.

Also applies to: 842-843

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/vector_search.py` around lines 803 - 804, The current
asyncio.gather call discards the fetched referenced-document payloads (from
_fetch_solr_rag/_extract_solr_document_metadata) and later rebuilds documents
from context_chunks, losing OKP titles; modify the flow so the results returned
by byok_chunks_task and solr_chunks_task preserve referenced-document payloads
and their title metadata (either by: 1) having _fetch_solr_rag return the
referenced documents alongside chunks and filtering that referenced-doc list to
only the surviving chunk document_ids before returning, or 2) propagating
title/doc metadata into each chunk's attributes before
_convert_solr_chunks_to_rag_format runs), update places that call
_convert_solr_chunks_to_rag_format to accept/merge the extra metadata, and
mirror the behavior of parse_referenced_documents_from_responses_api to keep
frontend-useful referenced documents.
src/models/config.py (1)

2132-2142: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix the auto-enable predicate before it overrides user config.

has_byok or has_inline_rag and has_okp and not self.reranker.enabled is parsed as has_byok or (...). That means any config with at least one byok_rag entry will force this branch, even when OKP is absent and even when the user explicitly set reranker.enabled = False. Gate this with has_byok and has_okp plus _explicitly_configured, or drop the validator entirely since enabled=True is already the default.

🐛 Minimal guard fix
-        if has_byok or has_inline_rag and has_okp and not self.reranker.enabled:
+        if (
+            has_byok
+            and has_okp
+            and not self.reranker._explicitly_configured  # pylint: disable=protected-access
+            and not self.reranker.enabled
+        ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/config.py` around lines 2132 - 2142, The condition currently
parses incorrectly and can auto-enable the reranker whenever has_byok is true;
change the predicate to require OKP plus a RAG source and only auto-enable when
the reranker was not explicitly set by the user. Replace the line `if has_byok
or has_inline_rag and has_okp and not self.reranker.enabled:` with a guarded
check such as `if (has_okp and (has_byok or has_inline_rag)) and not
self.reranker.enabled and not getattr(self.reranker, "_explicitly_configured",
False):` so the branch only runs when OKP is present, there is at least one RAG
source, reranker is disabled, and the user did not explicitly configure
reranker; keep the same logger and assignment to `self.reranker.enabled = True`.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/vector_search.py`:
- Around line 35-53: The synchronous _get_cross_encoder performs blocking
CrossEncoder(model_name) creation and should be offloaded from the event loop:
make _get_cross_encoder async (or keep it sync but perform the heavy
construction via asyncio.to_thread) and use await asyncio.to_thread(lambda:
CrossEncoder(model_name)) when populating _cross_encoder_models[model_name];
ensure callers such as _rerank_chunks_with_cross_encoder await the async loader
(or already call it from an async context) so the initial Cold-start model load
is executed off the event loop while leaving predict() threading as-is.

In `@tests/unit/utils/test_vector_search.py`:
- Around line 864-893: Change test_handles_import_error to simulate an actual
import-time failure (ImportError) instead of reusing the CrossEncoder
instantiation error: clear the _cross_encoder_models cache, then cause importing
the sentence_transformers dependency to fail (for example by using
mocker.patch.dict on sys.modules to remove or replace "sentence_transformers" or
patch importlib.import_module to raise ImportError) and reload or import
utils.vector_search so _get_cross_encoder runs the import-failure path; assert
that _get_cross_encoder("test-model") returns None. Keep the other
test_handles_model_loading_error as-is to cover instantiation errors from
CrossEncoder.

---

Duplicate comments:
In `@src/models/config.py`:
- Around line 2132-2142: The condition currently parses incorrectly and can
auto-enable the reranker whenever has_byok is true; change the predicate to
require OKP plus a RAG source and only auto-enable when the reranker was not
explicitly set by the user. Replace the line `if has_byok or has_inline_rag and
has_okp and not self.reranker.enabled:` with a guarded check such as `if
(has_okp and (has_byok or has_inline_rag)) and not self.reranker.enabled and not
getattr(self.reranker, "_explicitly_configured", False):` so the branch only
runs when OKP is present, there is at least one RAG source, reranker is
disabled, and the user did not explicitly configure reranker; keep the same
logger and assignment to `self.reranker.enabled = True`.

In `@src/utils/vector_search.py`:
- Around line 803-804: The current asyncio.gather call discards the fetched
referenced-document payloads (from
_fetch_solr_rag/_extract_solr_document_metadata) and later rebuilds documents
from context_chunks, losing OKP titles; modify the flow so the results returned
by byok_chunks_task and solr_chunks_task preserve referenced-document payloads
and their title metadata (either by: 1) having _fetch_solr_rag return the
referenced documents alongside chunks and filtering that referenced-doc list to
only the surviving chunk document_ids before returning, or 2) propagating
title/doc metadata into each chunk's attributes before
_convert_solr_chunks_to_rag_format runs), update places that call
_convert_solr_chunks_to_rag_format to accept/merge the extra metadata, and
mirror the behavior of parse_referenced_documents_from_responses_api to keep
frontend-useful referenced documents.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8311d4b1-2539-4c32-83dd-544949fcab43

📥 Commits

Reviewing files that changed from the base of the PR and between 5a81aae and 65f15b4.

📒 Files selected for processing (6)
  • src/models/config.py
  • src/utils/vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/utils/test_vector_search.py
  • tests/unit/models/config/test_dump_configuration.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/utils/test_vector_search.py
  • tests/unit/models/config/test_dump_configuration.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/models/config.py
  • src/utils/vector_search.py
src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic config models

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Type hints for Pydantic models: Optional[FilePath], PositiveInt, SecretStr
Use typing_extensions.Self for model validators in Pydantic models
Pydantic config classes extend ConfigurationBase, data models extend BaseModel
Use @model_validator and @field_validator for validation in Pydantic models
Docstring Attributes section documents Pydantic model attributes

Files:

  • src/models/config.py
🧠 Learnings (10)
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • tests/integration/endpoints/test_query_byok_integration.py
  • src/models/config.py
  • tests/unit/utils/test_vector_search.py
  • src/utils/vector_search.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • tests/unit/models/config/test_reranker_configuration.py
  • src/models/config.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/models/config.py : Base class sets `extra="forbid"` to reject unknown fields in Pydantic config models

Applied to files:

  • tests/unit/models/config/test_reranker_configuration.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests

Applied to files:

  • tests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.

Applied to files:

  • src/utils/vector_search.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/runners/**/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/utils/vector_search.py
🔇 Additional comments (2)
tests/unit/models/config/test_reranker_configuration.py (1)

41-47: Assert the observable behavior, not hasattr(...).

PrivateAttr(default=False) makes _explicitly_configured exist on every instance, so hasattr(config, "_explicitly_configured") will keep passing even if explicit-configuration detection breaks. Assert the actual value or a public effect instead.

tests/unit/utils/test_vector_search.py (1)

728-1159: Strong reranker test coverage improvement

Good additions across enable/disable wiring, cross-encoder caching, score normalization, failure fallbacks, and BYOK boost ordering/preservation. This meaningfully improves confidence in the new reranking path.

Comment thread src/utils/vector_search.py Outdated
Comment thread tests/unit/utils/test_vector_search.py Outdated
@Anxhela21 Anxhela21 force-pushed the anx/vector_compare branch from 65f15b4 to 6641b22 Compare April 30, 2026 19:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/models/config.py`:
- Around line 2138-2148: The auto-enable condition incorrectly triggers when
BYOK exists regardless of OKP or explicit user setting; update the if in the
block that references has_byok, has_inline_rag, has_okp and
self.reranker.enabled to require OKP plus either BYOK or inline RAG and also
ensure the reranker wasn’t explicitly configured (i.e., check not
self.reranker._explicitly_configured); specifically change the condition to
something like: if (has_okp and (has_byok or has_inline_rag) and not
self.reranker.enabled and not self.reranker._explicitly_configured) so the
reranker is only auto-enabled when OKP is present, a second source exists, and
the user didn’t explicitly set reranker.enabled.

In `@src/utils/vector_search.py`:
- Around line 700-701: The docstring for _fetch_solr_rag incorrectly documents a
non-existent parameter max_chunks (copied from _fetch_byok_rag); update the
docstring of _fetch_solr_rag to reflect its actual signature by removing the
max_chunks entry (or replacing it with the correct parameter name if intended),
ensuring any mention of constants.OKP_RAG_MAX_CHUNKS is removed or corrected,
and verify the parameter list in the docstring matches the function signature in
_fetch_solr_rag.
- Around line 768-770: The signature of build_rag_context incorrectly silences
an unused-argument warning for moderation_decision despite the function using
that parameter (it checks moderation_decision == "blocked"); remove the "#
pylint: disable=unused-argument" annotation from the build_rag_context
definition so linters reflect actual usage, and keep the existing conditional
that references moderation_decision (ensure no other code relies on the disable
comment); update any docstring or function comment if you want to document the
planned usage of moderation_decision.
- Line 17: Remove the top-level "from sentence_transformers import CrossEncoder"
import and instead perform a dynamic import inside the existing
_get_cross_encoder function: import sentence_transformers.CrossEncoder (or from
sentence_transformers import CrossEncoder) within _get_cross_encoder, keep the
module-level cached variable (the cache used by _get_cross_encoder) and
instantiate/cache the CrossEncoder there, and add a clear ImportError/exception
handling path to surface useful errors if the library is missing; ensure all
references use that cached instance and no other top-level references to
CrossEncoder remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ba65cf4d-67d0-4216-a736-eb3ca61da961

📥 Commits

Reviewing files that changed from the base of the PR and between 65f15b4 and 6641b22.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • pyproject.toml
  • src/configuration.py
  • src/constants.py
  • src/models/config.py
  • src/utils/vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: mypy
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (6)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
All modules start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types
Use Union types with modern syntax: str | int
Use Optional[Type] for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns: return new data structures instead of modifying parameters
Use async def for I/O operations and external API calls
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes use ABC with @abstractmethod decorators
Complete type annotations for all class attributes, use specific types, not Any
Follow Google Python docstring conventions for all modules, classes, and functions
Docstring Parameters section documents function parameters
Docstring Returns section documents function return values
Docstring Raises section documents exceptions that may be raised
Use black for code formatting
Use pylint for static analysis with source-roots configuration set to "src"
Use pyright for type checking
Use ruff for fast linting
Use pydocstyle for docstring style validation
Use mypy for additional type checking
Use bandit for security issue detection

Files:

  • src/configuration.py
  • src/constants.py
  • src/models/config.py
  • src/utils/vector_search.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/unit/**/*.py: Use pytest for all unit and integration tests
Use pytest-mock for AsyncMock objects in unit tests
Use marker pytest.mark.asyncio for async tests

Files:

  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/utils/test_vector_search.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not use unittest - pytest is the standard testing framework for this project

Files:

  • tests/unit/models/config/test_reranker_configuration.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/utils/test_vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Central constants.py for shared constants with descriptive comments

Files:

  • src/constants.py
src/models/config.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Base class sets extra="forbid" to reject unknown fields in Pydantic config models

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Type hints for Pydantic models: Optional[FilePath], PositiveInt, SecretStr
Use typing_extensions.Self for model validators in Pydantic models
Pydantic config classes extend ConfigurationBase, data models extend BaseModel
Use @model_validator and @field_validator for validation in Pydantic models
Docstring Attributes section documents Pydantic model attributes

Files:

  • src/models/config.py
🧠 Learnings (12)
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: ALWAYS check `pyproject.toml` for existing dependencies before adding new ones

Applied to files:

  • pyproject.toml
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • tests/unit/models/config/test_reranker_configuration.py
  • src/models/config.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/models/config.py : Base class sets `extra="forbid"` to reject unknown fields in Pydantic config models

Applied to files:

  • tests/unit/models/config/test_reranker_configuration.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/**/*.py : Use Final[type] as type hint for all constants

Applied to files:

  • src/constants.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • src/constants.py
  • src/models/config.py
  • tests/unit/utils/test_vector_search.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • src/utils/vector_search.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to tests/unit/**/*.py : Use `pytest-mock` for AsyncMock objects in unit tests

Applied to files:

  • tests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-30T13:33:40.479Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1073
File: src/app/endpoints/query_v2.py:575-586
Timestamp: 2026-01-30T13:33:40.479Z
Learning: In `src/app/endpoints/query_v2.py`, the `parse_referenced_documents_from_responses_api` function filters referenced documents to include only those with `doc_title` or `doc_url` because these documents are displayed in the frontend. Documents with only `doc_id` are intentionally excluded as they wouldn't provide useful information to end users.

Applied to files:

  • src/utils/vector_search.py
📚 Learning: 2026-04-29T15:45:22.854Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-29T15:45:22.854Z
Learning: Applies to src/runners/**/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/utils/vector_search.py
🔇 Additional comments (9)
src/utils/vector_search.py (5)

35-53: Initial CrossEncoder load still blocks the event loop.

The past review comment about offloading the initial model construction is still valid. While model.predict() is correctly threaded at line 87, _get_cross_encoder(model_name) at line 79 is synchronous and performs CrossEncoder(model_name) which downloads/loads the model on first call — blocking the event loop.

♻️ Proposed fix: make loader async with to_thread
-def _get_cross_encoder(model_name: str) -> Any:
+async def _get_cross_encoder(model_name: str) -> Any:
     """Return the lazy-loaded cross-encoder model for reranking.
     ...
     """
     if model_name not in _cross_encoder_models:
         try:
-            _cross_encoder_models[model_name] = CrossEncoder(model_name)
+            from sentence_transformers import CrossEncoder  # pylint: disable=import-outside-toplevel
+            _cross_encoder_models[model_name] = await asyncio.to_thread(CrossEncoder, model_name)
             logger.info("Loaded cross-encoder for RAG reranking: %s", model_name)
         except Exception as e:  # pylint: disable=broad-exception-caught
             ...
     return _cross_encoder_models[model_name]

And at the call site (line 79):

-        model = _get_cross_encoder(model_name)
+        model = await _get_cross_encoder(model_name)

Also applies to: 79-79


246-252: Documents with neither title nor URL are still emitted.

The past review comment remains valid: _referenced_documents_from_rag_chunks appends ReferencedDocument objects even when both attrs.get("title") and parsed_url are None. Per retrieved learnings, documents without doc_title or doc_url are intentionally filtered elsewhere because the frontend cannot render them.

♻️ Proposed fix
         parsed_url: Optional[AnyUrl] = None
         if doc_url:
             try:
                 parsed_url = AnyUrl(doc_url)
             except Exception:  # pylint: disable=broad-exception-caught
                 parsed_url = None
+        title = attrs.get("title")
+        if title is None and parsed_url is None:
+            continue
         result.append(
             ReferencedDocument(
-                doc_title=attrs.get("title"),
+                doc_title=title,
                 doc_url=parsed_url,
                 source=chunk.source,
             )
         )

Based on learnings: "In src/app/endpoints/query_v2.py, the parse_referenced_documents_from_responses_api function filters referenced documents to include only those with doc_title or doc_url because these documents are displayed in the frontend."


56-183: LGTM: Reranking implementation with score normalization and fallback.

The cross-encoder reranking logic is well-structured:

  • Proper empty-input guard (line 74-75)
  • model.predict() correctly offloaded via asyncio.to_thread (line 87)
  • Handles numpy array outputs via tolist() (lines 89-90)
  • Min-max normalization with edge-case handling (identical scores, single score)
  • Combined 30%/70% weighting preserves original score influence
  • Robust fallback on any exception (lines 172-182)

185-214: LGTM: BYOK boost implementation is correct.

The boost logic correctly:

  • Only applies boost to non-OKP chunks (line 200)
  • Handles None scores with float("-inf") fallback
  • Re-sorts by boosted scores
  • Preserves all chunk attributes

816-832: LGTM: Reranker integration with configuration toggle.

The conditional reranking is cleanly integrated:

  • Respects configuration.reranker.enabled flag
  • Passes configured model name
  • Falls back to simple truncation when disabled (line 832)
  • Applies BYOK boost only when reranking is enabled
tests/unit/utils/test_vector_search.py (4)

864-893: test_handles_import_error duplicates test_handles_model_loading_error.

Both tests patch CrossEncoder to raise the same Exception("Model loading failed") and assert model is None. This provides duplicate coverage without actually testing the import-failure scenario the test name implies.

To test an actual import error, you would need to simulate sentence_transformers being unavailable. However, since CrossEncoder is now imported at module level, this is non-trivial and may not be worth the complexity. Consider either:

  1. Removing test_handles_import_error as redundant, or
  2. Renaming it to clarify it tests the same instantiation-error path
♻️ Suggested simplification: remove duplicate test
-    def test_handles_import_error(self, mocker: MockerFixture) -> None:
-        """Test graceful handling of sentence_transformers import error."""
-        # Clear the cache for testing
-        # pylint: disable=import-outside-toplevel
-        from utils.vector_search import _cross_encoder_models
-
-        _cross_encoder_models.clear()
-
-        # Mock CrossEncoder to raise an exception when instantiated
-        mock_cross_encoder = mocker.patch("utils.vector_search.CrossEncoder")
-        mock_cross_encoder.side_effect = Exception("Model loading failed")
-
-        model = _get_cross_encoder("test-model")
-
-        assert model is None
-

728-820: LGTM: Reranker configuration tests cover both enabled and disabled paths.

The tests correctly verify:

  • _rerank_chunks_with_cross_encoder is called when reranker.enabled = True (lines 728-777)
  • Correct model_name is passed from configuration
  • Function is not called when reranker.enabled = False (lines 779-820)

896-1061: LGTM: Comprehensive cross-encoder reranking tests.

Test coverage is thorough:

  • Empty input handling
  • Score normalization (identical scores, single chunk)
  • Combined scoring formula verification with expected values
  • top_k truncation
  • Fallback on model loading failure
  • Fallback on prediction exception
  • Numpy array conversion via tolist()

1064-1158: LGTM: BYOK boost tests are well-designed.

Tests correctly verify:

  • Only non-OKP chunks receive boost
  • Correct sorting by boosted scores
  • Default boost factor usage
  • None score handling
  • Attribute preservation

Comment thread src/models/config.py
Comment thread src/utils/vector_search.py Outdated
Comment thread src/utils/vector_search.py
Comment thread src/utils/vector_search.py
Comment thread src/utils/vector_search.py Outdated
Comment thread src/models/config.py Outdated
Comment thread tests/unit/models/config/test_reranker_configuration.py Outdated
Comment thread tests/unit/models/config/test_reranker_configuration.py Outdated
Comment thread tests/unit/models/config/test_reranker_configuration.py
Comment thread src/utils/vector_search.py
Comment thread src/utils/vector_search.py Outdated
@are-ces
Copy link
Copy Markdown
Contributor

are-ces commented May 5, 2026

@Anxhela21 please fix unit tests. Not sure what causes the error in the e2e tests, I suspect the failure is not caught by the output debug logs

Anxhela21 added 9 commits May 6, 2026 17:03
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
@Anxhela21 Anxhela21 force-pushed the anx/vector_compare branch from b230117 to 92f1c8d Compare May 6, 2026 21:05
@are-ces
Copy link
Copy Markdown
Contributor

are-ces commented May 7, 2026

In e2e tests server mode group 3 this test is failing:
Failing scenarios: tests/e2e/features/inline_rag.feature:36 Inline RAG query includes referenced documents

PTAL, your changes might have to do with the failure.

Copy link
Copy Markdown
Contributor

@are-ces are-ces left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL I (or better Coderabbit :) might have found the issue for the e2e test failures

Comment thread pyproject.toml Outdated
# To be able to fix multiple CVEs, also LCORE-1117
"requests>=2.33.0",
# Used for RAG chunk reranking (cross-encoder)
"sentence-transformers>=2.0.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is redundant and not needed since it is already defined in llslibdev below.

Comment thread src/utils/vector_search.py Outdated
Anxhela21 added 3 commits May 8, 2026 12:56
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Signed-off-by: Anxhela Coba <acoba@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants