Skip to content

feat: add agent driver abstraction layer for evaluation pipeline#231

Open
rioloc wants to merge 5 commits intolightspeed-core:mainfrom
rioloc:feat/agent-driver-layer
Open

feat: add agent driver abstraction layer for evaluation pipeline#231
rioloc wants to merge 5 commits intolightspeed-core:mainfrom
rioloc:feat/agent-driver-layer

Conversation

@rioloc
Copy link
Copy Markdown
Collaborator

@rioloc rioloc commented May 7, 2026

⚠️ > Note: This PR depends on #228 (agents config models) being merged first. Please review after #228 is merged.

Summary

Introduces an agent driver abstraction layer (AgentDriverRegistry → AgentDriver(ABC) → HttpApiDriver) that replaces direct APIDataAmender usage in the evaluation pipeline. This enables per-conversation agent resolution and lays the groundwork for supporting multiple agent types beyond HTTP API.

  • AgentDriver(ABC) abstract base with stateless execute_turn() pattern, and HttpApiDriver implementation wrapping existing APIDataAmender
  • AgentDriverRegistry factory for driver creation by type
  • Pipeline integration: default driver at init (shared across threads), per-conversation driver resolution via EvaluationData.agent / agent_config fields
  • Integration tests extended with native agents: format configs alongside legacy api: configs, verifying both backward compatibility and the new config path

Architecture

graph TD
    subgraph Pipeline
        EP[EvaluationPipeline]
        EP -->|creates| ADR[AgentDriverRegistry]
        EP -->|resolves config| DAC[_resolve_default_agent_config]
        EP -->|resolves per-conv| RDC[_resolve_driver_for_conversation]
    end

    subgraph "Driver Layer"
        ADR -->|create_driver| AD[AgentDriver ABC]
        AD -->|implements| HAD[HttpApiDriver]
        AD -.->|future| FD[Other Drivers...]
        HAD -->|wraps| ADA[APIDataAmender]
    end

    subgraph Processor
        CP[ConversationProcessor]
        CP -->|execute_turn| AD
    end

    EP -->|default driver shared| CP
    RDC -->|per-conv driver| CP

    style AD fill:#f9f,stroke:#333
    style HAD fill:#bbf,stroke:#333
    style FD fill:#ddd,stroke:#999,stroke-dasharray: 5 5
Loading

Key design decisions

  • Stateless drivers: execute_turn() receives conversation_id as parameter and returns it — no internal state, safe for thread sharing
  • Pipeline owns driver lifecycle: registry, config resolution, and driver creation/cleanup all live in EvaluationPipeline (SRP — processor has zero config responsibility)
  • Driver is never None: disabled drivers skip internally, avoiding Optional checks in the processor

Changes

Production code

File Change
pipeline/evaluation/driver.py New: AgentDriver(ABC), HttpApiDriver, AgentDriverRegistry
pipeline/evaluation/pipeline.py Replace APIClient/APIDataAmender with driver architecture
pipeline/evaluation/processor.py Accept AgentDriver per-call instead of stored APIDataAmender
pipeline/evaluation/__init__.py Export driver classes

Tests

File Change
tests/unit/pipeline/evaluation/test_driver.py New: 13 tests for driver classes
tests/unit/pipeline/evaluation/test_pipeline.py Updated for driver architecture
tests/unit/pipeline/evaluation/test_processor.py Updated for driver-based processing
tests/unit/pipeline/evaluation/conftest.py Updated fixtures
tests/integration/test_full_evaluation.py Renamed TestFullApiEvaluation, extended parametrization for both config formats
tests/integration/system-config-agents-query.yaml New: native agents: format config
tests/integration/system-config-agents-streaming.yaml New: native agents: format config

Test plan

  • make black-format — all files formatted
  • make pre-commit — bandit, pyright, docstyle, ruff, pylint, black-check pass
  • make test — 896 unit tests pass, 0 failures
  • pytest tests/integration/ --collect-only — 10 integration tests collected
  • Integration tests verified with real stack (query endpoint passes for both config formats)
  • Review after feat: add agents configuration layer with backward-compatible api: migration #228 is merged

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Pluggable agent driver system with HTTP API agent support (query & streaming) and per-conversation agent selection/overrides
    • MCP header–based authentication support for agents
  • Configuration

    • System config adds an agents section with defaults, multiple agents, and automatic migration from legacy API config
    • Example integration configs for agent query and streaming modes added
  • Tests

    • Extensive unit and integration tests for agent behavior and config migration

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Warning

Rate limit exceeded

@rioloc has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 39 minutes and 10 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 44d208ca-8475-477d-b8c7-3006f7011844

📥 Commits

Reviewing files that changed from the base of the PR and between d7d84e8 and 4c91ba8.

📒 Files selected for processing (21)
  • src/lightspeed_evaluation/core/constants.py
  • src/lightspeed_evaluation/core/models/__init__.py
  • src/lightspeed_evaluation/core/models/agents.py
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/core/models/system.py
  • src/lightspeed_evaluation/core/system/loader.py
  • src/lightspeed_evaluation/pipeline/evaluation/__init__.py
  • src/lightspeed_evaluation/pipeline/evaluation/driver.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
  • tests/integration/system-config-agents-query.yaml
  • tests/integration/system-config-agents-streaming.yaml
  • tests/integration/test_full_evaluation.py
  • tests/unit/core/models/test_agents.py
  • tests/unit/core/models/test_data.py
  • tests/unit/core/models/test_system.py
  • tests/unit/core/system/test_loader.py
  • tests/unit/pipeline/evaluation/conftest.py
  • tests/unit/pipeline/evaluation/test_driver.py
  • tests/unit/pipeline/evaluation/test_pipeline.py
  • tests/unit/pipeline/evaluation/test_processor.py

Walkthrough

This PR introduces a pluggable agent driver architecture for the LightSpeed evaluation framework, replacing direct API client usage with a configurable agent driver pattern. New agent configuration models support HTTP API agents with optional MCP authentication, SystemConfig auto-migrates legacy API settings to agents, and the pipeline/processor now create, resolve, and manage drivers per conversation while delegating turn execution to drivers.

Changes

Agent Driver Architecture Implementation

Layer / File(s) Summary
Agent Configuration Constants and Data Models
src/lightspeed_evaluation/core/constants.py, src/lightspeed_evaluation/core/models/agents.py
Adds DEFAULT_AGENT_TYPE, SUPPORTED_AGENT_TYPES; defines MCPServerConfig, MCPHeadersConfig, HttpApiBaseFields, HttpApiAgentConfig, AgentDefaultConfig, and AgentsConfig with parsing/resolution logic.
Public Module Exports
src/lightspeed_evaluation/core/models/__init__.py, src/lightspeed_evaluation/pipeline/evaluation/__init__.py
Exports agent configuration models and driver classes via package exports and lazy imports.
Evaluation Data Agent Fields
src/lightspeed_evaluation/core/models/data.py
EvaluationData adds optional agent and agent_config fields for per-conversation agent selection and overrides.
System Configuration and Migration
src/lightspeed_evaluation/core/models/system.py
APIConfig now extends HttpApiBaseFields; SystemConfig adds agents and validators to migrate legacy apiagents and validate default agent references.
Configuration Loader
src/lightspeed_evaluation/core/system/loader.py
ConfigLoader._create_system_config() forwards optional agents section from YAML into SystemConfig.
Agent Driver Interface and Implementation
src/lightspeed_evaluation/pipeline/evaluation/driver.py
Adds AgentDriver ABC, HttpApiDriver implementation (validates HttpApiAgentConfig, manages API client/amender lifecycle, delegates execute_turn), AGENT_DRIVERS, and AgentDriverRegistry.
Evaluation Pipeline Integration
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
Pipeline creates AgentDriverRegistry and _default_driver, resolves per-conversation drivers (overrides), routes tasks via _process_conversation, tracks enabled-driver usage for amended-data saving, and closes drivers on cleanup.
Conversation Processor Integration
src/lightspeed_evaluation/pipeline/evaluation/processor.py
Processor accepts agent_driver; turn execution routed through agent_driver.execute_turn(); setup/cleanup gated by driver enablement via skip flag; TurnProcessingContext stores driver reference.
Integration Configs
tests/integration/system-config-agents-query.yaml, tests/integration/system-config-agents-streaming.yaml
Adds agents-format YAML configurations for HTTP API agents (query and streaming endpoints).
Unit Tests — Models & Migration
tests/unit/core/models/test_agents.py, tests/unit/core/models/test_system.py, tests/unit/core/system/test_loader.py, tests/unit/core/models/test_data.py
Adds/updates tests for agent model validation, AgentsConfig parsing/resolution, SystemConfig migration, loader behavior, and EvaluationData agent fields.
Unit Tests — Driver & Registry
tests/unit/pipeline/evaluation/test_driver.py
Adds tests for AgentDriverRegistry.create_driver, HttpApiDriver behavior, driver enabled/close semantics, and base class defaults.
Unit Tests — Pipeline/Processor/Fixtures
tests/unit/pipeline/evaluation/*
Updates conftest with mock_agent_driver, refactors pipeline and processor tests to use driver registry/driver pattern, adds coverage for setup/cleanup skip behavior and amended-data saving/close/cache handling.
Integration Tests — Full Evaluation
tests/integration/test_full_evaluation.py
Updates tests to derive agent enablement/endpoint type from system_config.agents via get_agent_info() and adjusts parametrization.

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add agent driver abstraction layer for evaluation pipeline' clearly summarizes the primary change: introducing an agent driver abstraction layer into the evaluation pipeline, which is the main architectural improvement across the 15+ modified files.
Docstring Coverage ✅ Passed Docstring coverage is 93.10% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
tests/unit/pipeline/evaluation/test_driver.py (1)

171-207: 💤 Low value

StubDriver is defined identically in both base-class test methods — extractable as a shared class-level fixture.

Both test_enabled_default_is_true and test_close_is_noop define an identical StubDriver inner class. The duplication is minor but a class-level nested class (or a module-level class) would remove the repetition.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/pipeline/evaluation/test_driver.py` around lines 171 - 207, The
two tests test_enabled_default_is_true and test_close_is_noop both define an
identical inner StubDriver class; refactor by moving StubDriver out of the test
methods into a shared scope (e.g., as a class-level attribute on
TestAgentDriverBase or a module-level helper) so both tests instantiate the same
StubDriver without duplication; ensure the moved StubDriver implements
execute_turn and validate_config exactly as in the current inner definitions and
update the tests to construct driver = StubDriver({}) and call driver.enabled /
driver.close() accordingly.
tests/integration/test_full_evaluation.py (1)

43-54: 💤 Low value

Consider failing loudly when neither agents: nor a default agent is set.

The fallback return False, "query" silently treats a missing/empty agents config as "disabled query agent". Given that migrate_api_to_agents always populates agents when api: is present, hitting this branch in an integration config almost certainly indicates a misconfigured fixture YAML rather than legitimate behavior. Raising an explicit error (or asserting at fixture load time) would catch such mistakes earlier.

♻️ Possible tightening
 def get_agent_info(system_config: SystemConfig) -> tuple[bool, str]:
     """Resolve agent enabled status and endpoint_type from any config format.

     Works for both legacy api: and native agents: configs since
     migrate_api_to_agents() ensures agents is always populated when
     api: is present.
     """
     if system_config.agents and system_config.agents.default.agent:
         name = system_config.agents.default.agent
         agent_def = system_config.agents.agents[name]
         return agent_def.enabled, agent_def.endpoint_type
-    return False, "query"
+    raise AssertionError(
+        "Integration config has neither agents: nor api: — "
+        "expected migration to populate agents.default.agent"
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_full_evaluation.py` around lines 43 - 54, The helper
get_agent_info currently falls back to returning (False, "query") when no
agents/default agent is present, which hides misconfigured fixtures; change it
to raise an explicit error (e.g., raise RuntimeError or ValueError) when
system_config.agents is missing or system_config.agents.default.agent is empty,
including the problematic system_config (or its name/identifier) in the message
so tests fail loudly and point to the bad fixture; preserve the existing branch
that reads name = system_config.agents.default.agent and returns
agent_def.enabled, agent_def.endpoint_type when present.
src/lightspeed_evaluation/core/models/system.py (1)

805-829: 💤 Low value

Use exclude_unset=True in model_dump() and return a new dict instead of mutating the input.

Two optional improvements to migrate_api_to_agents:

  1. Forwarding unset fields (line 820): api_data.model_dump(exclude={"enabled"}) serializes every field including those with None defaults. This means the synthesized agent config will explicitly contain those None values, preventing the agent's own defaults from taking effect if they differ from APIConfig's defaults. Use exclude_unset=True to forward only user-provided values.

  2. In-place mutation of the input dict (line 825): data["agents"] = {...} mutates the caller's dict when they invoke SystemConfig.model_validate(my_dict). Return a shallow copy instead to avoid surprising side effects on shared dict references.

Suggested adjustment
     `@model_validator`(mode="before")
     `@classmethod`
     def migrate_api_to_agents(cls, data: Any) -> Any:
         """Auto-migrate api: to agents: when agents: is absent."""
         if not isinstance(data, dict):
             return data
         if data.get("agents") is not None or data.get("api") is None:
             return data

         api_data = data["api"]
         if isinstance(api_data, dict):
             api_enabled = api_data.get("enabled", True)
             agent_fields = {k: v for k, v in api_data.items() if k != "enabled"}
         elif isinstance(api_data, BaseModel):
             api_enabled = getattr(api_data, "enabled", True)
-            agent_fields = api_data.model_dump(exclude={"enabled"})
+            agent_fields = api_data.model_dump(
+                exclude={"enabled"}, exclude_unset=True
+            )
         else:
             return data

         agent_fields["type"] = "http_api"
-        data["agents"] = {
+        return {
+            **data,
             "default": {"agent": "http_api" if api_enabled else None},
             "http_api": agent_fields,
-        }
-        return data
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/core/models/system.py` around lines 805 - 829,
migrate_api_to_agents currently mutates the input dict and uses
api_data.model_dump(exclude={"enabled"}) which includes unset fields; change it
to call model_dump(exclude_unset=True, exclude={"enabled"}) to only forward
user-provided values, and return a shallow copy of the incoming mapping instead
of mutating it: make a new_data = dict(data) (or copy.copy(data)), populate
new_data["agents"] with the synthesized structure (ensure agent_fields is a new
dict in both dict and BaseModel branches), and return new_data; references:
migrate_api_to_agents, api_data.model_dump, data["agents"], agent_fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/core/models/data.py`:
- Around line 405-410: Update the Pydantic Field for the agent attribute in the
model defined in src/lightspeed_evaluation/core/models/data.py to use a fully
anchored regex so leading/trailing/embedded whitespace is rejected: replace the
current pattern r"\S" on the agent Field with r"^\S+$" (you can also remove the
now-redundant min_length=1) so lookups against agent names (e.g., in agents
config) won't silently fail due to surrounding spaces.

In `@src/lightspeed_evaluation/pipeline/evaluation/driver.py`:
- Around line 97-102: The constructor currently treats an explicit empty dict
the same as None and also adds a file-level pylint disable; change
AgentDriverRegistry.__init__ to preserve injected empty registries by using an
identity None check (if drivers is None: self._drivers = AGENT_DRIVERS else:
self._drivers = drivers) and remove the "# pylint:
disable=too-few-public-methods" comment from the class declaration so linting
isn't suppressed; keep the class small as-is (or convert to a tiny factory
elsewhere if preferred), referencing AgentDriverRegistry.__init__ and
AGENT_DRIVERS to locate the change.
- Around line 50-66: The execute_turn method must no-op when the driver is
disabled: add a guard at the start of execute_turn that checks self._enabled
and, if False, returns (None, conversation_id) without calling
self._amender.amend_single_turn; also avoid constructing side-effecting members
when disabled by early-returning or skipping creation of self._api_client and
self._amender in __init__ (check _enabled after parsing config) so disabled
drivers do nothing but still present the same API.

In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`:
- Around line 185-188: The save branch currently checks only
self._default_driver.enabled so per-conversation drivers that are enabled can be
missed; update the conversation driver resolution/processing code (where you
determine the effective driver for each conversation—e.g., the method that
resolves or applies drivers) to set a boolean flag like
self._used_enabled_driver whenever the effective driver for any processed
conversation is enabled, ensure this flag is initialized/reset at the start of
the evaluation run, and change the final save check to use
self._used_enabled_driver before calling
self._save_amended_data(evaluation_data) instead of checking
self._default_driver.enabled.

In `@tests/unit/pipeline/evaluation/test_driver.py`:
- Around line 99-112: The test_enabled_reflects_config test misses patching
APIClient so HttpApiDriver.__init__ creates a real client; patch
"lightspeed_evaluation.pipeline.evaluation.driver.APIClient" (e.g., with
mocker.patch) before instantiating HttpApiDriver in test_enabled_reflects_config
so the driver uses a mocked APIClient instance (similar to
test_close_with_api_client), ensuring isolation when mock_config.enabled = True.

---

Nitpick comments:
In `@src/lightspeed_evaluation/core/models/system.py`:
- Around line 805-829: migrate_api_to_agents currently mutates the input dict
and uses api_data.model_dump(exclude={"enabled"}) which includes unset fields;
change it to call model_dump(exclude_unset=True, exclude={"enabled"}) to only
forward user-provided values, and return a shallow copy of the incoming mapping
instead of mutating it: make a new_data = dict(data) (or copy.copy(data)),
populate new_data["agents"] with the synthesized structure (ensure agent_fields
is a new dict in both dict and BaseModel branches), and return new_data;
references: migrate_api_to_agents, api_data.model_dump, data["agents"],
agent_fields.

In `@tests/integration/test_full_evaluation.py`:
- Around line 43-54: The helper get_agent_info currently falls back to returning
(False, "query") when no agents/default agent is present, which hides
misconfigured fixtures; change it to raise an explicit error (e.g., raise
RuntimeError or ValueError) when system_config.agents is missing or
system_config.agents.default.agent is empty, including the problematic
system_config (or its name/identifier) in the message so tests fail loudly and
point to the bad fixture; preserve the existing branch that reads name =
system_config.agents.default.agent and returns agent_def.enabled,
agent_def.endpoint_type when present.

In `@tests/unit/pipeline/evaluation/test_driver.py`:
- Around line 171-207: The two tests test_enabled_default_is_true and
test_close_is_noop both define an identical inner StubDriver class; refactor by
moving StubDriver out of the test methods into a shared scope (e.g., as a
class-level attribute on TestAgentDriverBase or a module-level helper) so both
tests instantiate the same StubDriver without duplication; ensure the moved
StubDriver implements execute_turn and validate_config exactly as in the current
inner definitions and update the tests to construct driver = StubDriver({}) and
call driver.enabled / driver.close() accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb7bfdd5-39f1-4be6-b236-a466ec5155a4

📥 Commits

Reviewing files that changed from the base of the PR and between 811e1ff and 9a55579.

📒 Files selected for processing (21)
  • src/lightspeed_evaluation/core/constants.py
  • src/lightspeed_evaluation/core/models/__init__.py
  • src/lightspeed_evaluation/core/models/agents.py
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/core/models/system.py
  • src/lightspeed_evaluation/core/system/loader.py
  • src/lightspeed_evaluation/pipeline/evaluation/__init__.py
  • src/lightspeed_evaluation/pipeline/evaluation/driver.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
  • tests/integration/system-config-agents-query.yaml
  • tests/integration/system-config-agents-streaming.yaml
  • tests/integration/test_full_evaluation.py
  • tests/unit/core/models/test_agents.py
  • tests/unit/core/models/test_data.py
  • tests/unit/core/models/test_system.py
  • tests/unit/core/system/test_loader.py
  • tests/unit/pipeline/evaluation/conftest.py
  • tests/unit/pipeline/evaluation/test_driver.py
  • tests/unit/pipeline/evaluation/test_pipeline.py
  • tests/unit/pipeline/evaluation/test_processor.py

Comment thread src/lightspeed_evaluation/core/models/data.py
Comment thread src/lightspeed_evaluation/pipeline/evaluation/driver.py
Comment thread src/lightspeed_evaluation/pipeline/evaluation/driver.py
Comment thread src/lightspeed_evaluation/pipeline/evaluation/pipeline.py Outdated
Comment thread tests/unit/pipeline/evaluation/test_driver.py
@rioloc rioloc force-pushed the feat/agent-driver-layer branch 2 times, most recently from f8bd187 to d7d84e8 Compare May 8, 2026 09:41
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1)

173-190: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reset _has_agent_amendments at the start of each run.

The flag is only initialized in __init__. If the same EvaluationPipeline instance is reused, one earlier run with an enabled driver leaves it True, and later runs will still try to persist amended data even when every effective driver is disabled.

Possible fix
     def run_evaluation(
         self,
         evaluation_data: list[EvaluationData],
         original_data_path: Optional[str] = None,
     ) -> list[EvaluationResult]:
@@
         """
         self.original_data_path = original_data_path
+        self._has_agent_amendments = False
         logger.info("Starting evaluation")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py` around lines 173 -
190, Reset the instance-level flag self._has_agent_amendments at the start of
each run to avoid carrying over state between runs; inside the method that
initializes the run (the block that sets self.original_data_path, calls
self.storage_backend.initialize(RunInfo(...)) and then calls
self._process_eval_data), add a statement to set self._has_agent_amendments =
False before processing so amended-data persistence only occurs for changes made
during the current invocation.
🧹 Nitpick comments (1)
tests/integration/test_full_evaluation.py (1)

43-54: ⚡ Quick win

Reuse resolve_agent_config() here to keep the test aligned with production resolution.

This helper duplicates the default-agent lookup by reaching into system_config.agents.agents[name] directly. Calling the same resolver the pipeline uses will make these tests follow future config-resolution changes automatically instead of drifting.

Possible refactor
 def get_agent_info(system_config: SystemConfig) -> tuple[bool, str]:
@@
-    if system_config.agents and system_config.agents.default.agent:
-        name = system_config.agents.default.agent
-        agent_def = system_config.agents.agents[name]
-        return agent_def.enabled, agent_def.endpoint_type
+    if system_config.agents and system_config.agents.default.agent:
+        _name, agent_config = system_config.agents.resolve_agent_config()
+        return (
+            bool(agent_config.get("enabled", False)),
+            str(agent_config.get("endpoint_type", "query")),
+        )
     return False, "query"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_full_evaluation.py` around lines 43 - 54, The helper
get_agent_info duplicates config resolution; replace its manual lookup with the
production resolver resolve_agent_config: if a default agent name exists
(system_config.agents.default.agent) call resolve_agent_config(system_config,
<default_name>) to obtain the resolved agent config and return its enabled and
endpoint_type, and if resolve_agent_config returns no config fall back to
returning (False, "query"); keep function signature
get_agent_info(system_config: SystemConfig) and only change the body to use
resolve_agent_config.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`:
- Line 45: The class-level pylint suppression on EvaluationPipeline must be
removed and the attribute fan-out reduced: create a small data holder (e.g.,
EvaluationContext or PipelineContext) to encapsulate related runtime state and
collaborators currently stored as separate instance attributes on
EvaluationPipeline, move those attributes into that context object, update
EvaluationPipeline.__init__ to accept and store one context instance (and adjust
any references inside methods to use self.context.<name>), and delete the "#
pylint: disable=too-many-instance-attributes" comment; ensure tests and usages
construct the new context when instantiating EvaluationPipeline.
- Around line 147-157: The code currently raises ConfigurationError when
self.system_config.agents is None even if a per-conversation override exists;
update the logic in the block that checks self.system_config.agents so that if
conv_data.agent_config (or conv_data.agent) is provided and self._registry can
build a driver from that override, you bypass resolve_agent_config and call
self._registry.create_driver(agent_config_override) directly, returning (driver,
True); only raise ConfigurationError when no agents are configured and there is
no per-conversation agent_config to construct a driver. Ensure you reference
system_config.agents, conv_data.agent_config, resolve_agent_config,
ConfigurationError, and _registry.create_driver in the change.

---

Outside diff comments:
In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`:
- Around line 173-190: Reset the instance-level flag self._has_agent_amendments
at the start of each run to avoid carrying over state between runs; inside the
method that initializes the run (the block that sets self.original_data_path,
calls self.storage_backend.initialize(RunInfo(...)) and then calls
self._process_eval_data), add a statement to set self._has_agent_amendments =
False before processing so amended-data persistence only occurs for changes made
during the current invocation.

---

Nitpick comments:
In `@tests/integration/test_full_evaluation.py`:
- Around line 43-54: The helper get_agent_info duplicates config resolution;
replace its manual lookup with the production resolver resolve_agent_config: if
a default agent name exists (system_config.agents.default.agent) call
resolve_agent_config(system_config, <default_name>) to obtain the resolved agent
config and return its enabled and endpoint_type, and if resolve_agent_config
returns no config fall back to returning (False, "query"); keep function
signature get_agent_info(system_config: SystemConfig) and only change the body
to use resolve_agent_config.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a840a7b3-29ba-43df-8487-0867d78897c0

📥 Commits

Reviewing files that changed from the base of the PR and between f8bd187 and d7d84e8.

📒 Files selected for processing (6)
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/pipeline/evaluation/driver.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • tests/integration/test_full_evaluation.py
  • tests/unit/pipeline/evaluation/test_driver.py
  • tests/unit/pipeline/evaluation/test_pipeline.py
✅ Files skipped from review due to trivial changes (1)
  • tests/unit/pipeline/evaluation/test_driver.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/pipeline/evaluation/driver.py
  • tests/unit/pipeline/evaluation/test_pipeline.py



class EvaluationPipeline:
class EvaluationPipeline: # pylint: disable=too-many-instance-attributes
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Remove the new pylint suppression from EvaluationPipeline.

This adds a # pylint: disable=too-many-instance-attributes comment in src/, which the repo explicitly disallows. Please fix the attribute fan-out instead of suppressing the warning, e.g. by grouping runtime state/collaborators into a small context object.

As per coding guidelines "Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py` at line 45, The
class-level pylint suppression on EvaluationPipeline must be removed and the
attribute fan-out reduced: create a small data holder (e.g., EvaluationContext
or PipelineContext) to encapsulate related runtime state and collaborators
currently stored as separate instance attributes on EvaluationPipeline, move
those attributes into that context object, update EvaluationPipeline.__init__ to
accept and store one context instance (and adjust any references inside methods
to use self.context.<name>), and delete the "# pylint:
disable=too-many-instance-attributes" comment; ensure tests and usages construct
the new context when instantiating EvaluationPipeline.

Comment on lines +147 to +157
if self.system_config.agents is None:
raise ConfigurationError(
f"Conversation '{conv_data.conversation_group_id}' specifies "
f"agent overrides but no agents configuration is defined"
)

_name, agent_config = self.system_config.agents.resolve_agent_config(
agent_name=conv_data.agent,
agent_config_override=conv_data.agent_config,
)
return self._registry.create_driver(agent_config), True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Let inline agent_config overrides work without a top-level agents: section.

Right now any conversation with EvaluationData.agent_config fails with ConfigurationError if system_config.agents is missing, even though the registry can build a driver directly from that override. That breaks the highest-precedence per-conversation override path.

[suggested fix omitted? no, include diff]

Possible fix
-        if self.system_config.agents is None:
+        if self.system_config.agents is None:
+            if conv_data.agent_config is not None:
+                return self._registry.create_driver(conv_data.agent_config), True
             raise ConfigurationError(
                 f"Conversation '{conv_data.conversation_group_id}' specifies "
                 f"agent overrides but no agents configuration is defined"
             )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py` around lines 147 -
157, The code currently raises ConfigurationError when self.system_config.agents
is None even if a per-conversation override exists; update the logic in the
block that checks self.system_config.agents so that if conv_data.agent_config
(or conv_data.agent) is provided and self._registry can build a driver from that
override, you bypass resolve_agent_config and call
self._registry.create_driver(agent_config_override) directly, returning (driver,
True); only raise ConfigurationError when no agents are configured and there is
no per-conversation agent_config to construct a driver. Ensure you reference
system_config.agents, conv_data.agent_config, resolve_agent_config,
ConfigurationError, and _registry.create_driver in the change.

rioloc and others added 5 commits May 8, 2026 11:51
…gration

Introduce a generic `agents:` configuration block in system.yaml that
makes the HTTP API one agent type among potentially many (e.g., Kubernetes
CRD agents for Openshift Agentic Lightspeed).

The key design ensures zero breakage of existing configs and code:

- New Pydantic models: AgentsConfig, AgentDefaultConfig, HttpApiAgentConfig
  in a new `core/models/agents.py` module
- SystemConfig gains an `agents: Optional[AgentsConfig]` field alongside
  the existing `api:` field
- A model_validator auto-migrates `api:` to `agents:` when `agents:` is
  absent, so all existing system.yaml files continue working unchanged
- `api.enabled=True` maps to `default.agent="http_api"`;
  `api.enabled=False` maps to `default.agent=None`
- The `api` field is preserved — all downstream code reading `config.api`
  continues to work without modification
- EvaluationData gains `agent` and `agent_config` optional fields for
  per-conversation-group agent selection and config overrides
- Config resolution follows a 3-level priority chain:
  eval_data.agent_config > agents.<name> > agents.default
- ConfigLoader passes `agents` YAML data through to SystemConfig

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace direct APIDataAmender usage with abstract AgentDriver interface,
enabling per-conversation agent resolution and support for multiple agent
types. Pipeline owns driver lifecycle; processor receives driver per call.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace APIDataAmender/APIClient mocks with AgentDriver mocks in
processor and pipeline tests. Add unit tests for AgentDriverRegistry,
HttpApiDriver, and AgentDriver base class. All 113 evaluation tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add native agents: config format YAML files alongside legacy api: configs.
Rename TestFullEvaluation to TestFullApiEvaluation and extend parametrization
to cover both config formats (backward compat via migrate + native agents).
Add get_agent_info() helper to resolve agent state from either format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant