feat: add agent driver abstraction layer for evaluation pipeline#231
feat: add agent driver abstraction layer for evaluation pipeline#231rioloc wants to merge 5 commits intolightspeed-core:mainfrom
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (21)
WalkthroughThis PR introduces a pluggable agent driver architecture for the LightSpeed evaluation framework, replacing direct API client usage with a configurable agent driver pattern. New agent configuration models support HTTP API agents with optional MCP authentication, SystemConfig auto-migrates legacy API settings to agents, and the pipeline/processor now create, resolve, and manage drivers per conversation while delegating turn execution to drivers. ChangesAgent Driver Architecture Implementation
🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
tests/unit/pipeline/evaluation/test_driver.py (1)
171-207: 💤 Low value
StubDriveris defined identically in both base-class test methods — extractable as a shared class-level fixture.Both
test_enabled_default_is_trueandtest_close_is_noopdefine an identicalStubDriverinner class. The duplication is minor but a class-level nested class (or a module-level class) would remove the repetition.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/pipeline/evaluation/test_driver.py` around lines 171 - 207, The two tests test_enabled_default_is_true and test_close_is_noop both define an identical inner StubDriver class; refactor by moving StubDriver out of the test methods into a shared scope (e.g., as a class-level attribute on TestAgentDriverBase or a module-level helper) so both tests instantiate the same StubDriver without duplication; ensure the moved StubDriver implements execute_turn and validate_config exactly as in the current inner definitions and update the tests to construct driver = StubDriver({}) and call driver.enabled / driver.close() accordingly.tests/integration/test_full_evaluation.py (1)
43-54: 💤 Low valueConsider failing loudly when neither
agents:nor a default agent is set.The fallback
return False, "query"silently treats a missing/empty agents config as "disabled query agent". Given thatmigrate_api_to_agentsalways populatesagentswhenapi:is present, hitting this branch in an integration config almost certainly indicates a misconfigured fixture YAML rather than legitimate behavior. Raising an explicit error (or asserting at fixture load time) would catch such mistakes earlier.♻️ Possible tightening
def get_agent_info(system_config: SystemConfig) -> tuple[bool, str]: """Resolve agent enabled status and endpoint_type from any config format. Works for both legacy api: and native agents: configs since migrate_api_to_agents() ensures agents is always populated when api: is present. """ if system_config.agents and system_config.agents.default.agent: name = system_config.agents.default.agent agent_def = system_config.agents.agents[name] return agent_def.enabled, agent_def.endpoint_type - return False, "query" + raise AssertionError( + "Integration config has neither agents: nor api: — " + "expected migration to populate agents.default.agent" + )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_full_evaluation.py` around lines 43 - 54, The helper get_agent_info currently falls back to returning (False, "query") when no agents/default agent is present, which hides misconfigured fixtures; change it to raise an explicit error (e.g., raise RuntimeError or ValueError) when system_config.agents is missing or system_config.agents.default.agent is empty, including the problematic system_config (or its name/identifier) in the message so tests fail loudly and point to the bad fixture; preserve the existing branch that reads name = system_config.agents.default.agent and returns agent_def.enabled, agent_def.endpoint_type when present.src/lightspeed_evaluation/core/models/system.py (1)
805-829: 💤 Low valueUse
exclude_unset=Trueinmodel_dump()and return a new dict instead of mutating the input.Two optional improvements to
migrate_api_to_agents:
Forwarding unset fields (line 820):
api_data.model_dump(exclude={"enabled"})serializes every field including those withNonedefaults. This means the synthesized agent config will explicitly contain thoseNonevalues, preventing the agent's own defaults from taking effect if they differ from APIConfig's defaults. Useexclude_unset=Trueto forward only user-provided values.In-place mutation of the input dict (line 825):
data["agents"] = {...}mutates the caller's dict when they invokeSystemConfig.model_validate(my_dict). Return a shallow copy instead to avoid surprising side effects on shared dict references.Suggested adjustment
`@model_validator`(mode="before") `@classmethod` def migrate_api_to_agents(cls, data: Any) -> Any: """Auto-migrate api: to agents: when agents: is absent.""" if not isinstance(data, dict): return data if data.get("agents") is not None or data.get("api") is None: return data api_data = data["api"] if isinstance(api_data, dict): api_enabled = api_data.get("enabled", True) agent_fields = {k: v for k, v in api_data.items() if k != "enabled"} elif isinstance(api_data, BaseModel): api_enabled = getattr(api_data, "enabled", True) - agent_fields = api_data.model_dump(exclude={"enabled"}) + agent_fields = api_data.model_dump( + exclude={"enabled"}, exclude_unset=True + ) else: return data agent_fields["type"] = "http_api" - data["agents"] = { + return { + **data, "default": {"agent": "http_api" if api_enabled else None}, "http_api": agent_fields, - } - return data + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lightspeed_evaluation/core/models/system.py` around lines 805 - 829, migrate_api_to_agents currently mutates the input dict and uses api_data.model_dump(exclude={"enabled"}) which includes unset fields; change it to call model_dump(exclude_unset=True, exclude={"enabled"}) to only forward user-provided values, and return a shallow copy of the incoming mapping instead of mutating it: make a new_data = dict(data) (or copy.copy(data)), populate new_data["agents"] with the synthesized structure (ensure agent_fields is a new dict in both dict and BaseModel branches), and return new_data; references: migrate_api_to_agents, api_data.model_dump, data["agents"], agent_fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lightspeed_evaluation/core/models/data.py`:
- Around line 405-410: Update the Pydantic Field for the agent attribute in the
model defined in src/lightspeed_evaluation/core/models/data.py to use a fully
anchored regex so leading/trailing/embedded whitespace is rejected: replace the
current pattern r"\S" on the agent Field with r"^\S+$" (you can also remove the
now-redundant min_length=1) so lookups against agent names (e.g., in agents
config) won't silently fail due to surrounding spaces.
In `@src/lightspeed_evaluation/pipeline/evaluation/driver.py`:
- Around line 97-102: The constructor currently treats an explicit empty dict
the same as None and also adds a file-level pylint disable; change
AgentDriverRegistry.__init__ to preserve injected empty registries by using an
identity None check (if drivers is None: self._drivers = AGENT_DRIVERS else:
self._drivers = drivers) and remove the "# pylint:
disable=too-few-public-methods" comment from the class declaration so linting
isn't suppressed; keep the class small as-is (or convert to a tiny factory
elsewhere if preferred), referencing AgentDriverRegistry.__init__ and
AGENT_DRIVERS to locate the change.
- Around line 50-66: The execute_turn method must no-op when the driver is
disabled: add a guard at the start of execute_turn that checks self._enabled
and, if False, returns (None, conversation_id) without calling
self._amender.amend_single_turn; also avoid constructing side-effecting members
when disabled by early-returning or skipping creation of self._api_client and
self._amender in __init__ (check _enabled after parsing config) so disabled
drivers do nothing but still present the same API.
In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`:
- Around line 185-188: The save branch currently checks only
self._default_driver.enabled so per-conversation drivers that are enabled can be
missed; update the conversation driver resolution/processing code (where you
determine the effective driver for each conversation—e.g., the method that
resolves or applies drivers) to set a boolean flag like
self._used_enabled_driver whenever the effective driver for any processed
conversation is enabled, ensure this flag is initialized/reset at the start of
the evaluation run, and change the final save check to use
self._used_enabled_driver before calling
self._save_amended_data(evaluation_data) instead of checking
self._default_driver.enabled.
In `@tests/unit/pipeline/evaluation/test_driver.py`:
- Around line 99-112: The test_enabled_reflects_config test misses patching
APIClient so HttpApiDriver.__init__ creates a real client; patch
"lightspeed_evaluation.pipeline.evaluation.driver.APIClient" (e.g., with
mocker.patch) before instantiating HttpApiDriver in test_enabled_reflects_config
so the driver uses a mocked APIClient instance (similar to
test_close_with_api_client), ensuring isolation when mock_config.enabled = True.
---
Nitpick comments:
In `@src/lightspeed_evaluation/core/models/system.py`:
- Around line 805-829: migrate_api_to_agents currently mutates the input dict
and uses api_data.model_dump(exclude={"enabled"}) which includes unset fields;
change it to call model_dump(exclude_unset=True, exclude={"enabled"}) to only
forward user-provided values, and return a shallow copy of the incoming mapping
instead of mutating it: make a new_data = dict(data) (or copy.copy(data)),
populate new_data["agents"] with the synthesized structure (ensure agent_fields
is a new dict in both dict and BaseModel branches), and return new_data;
references: migrate_api_to_agents, api_data.model_dump, data["agents"],
agent_fields.
In `@tests/integration/test_full_evaluation.py`:
- Around line 43-54: The helper get_agent_info currently falls back to returning
(False, "query") when no agents/default agent is present, which hides
misconfigured fixtures; change it to raise an explicit error (e.g., raise
RuntimeError or ValueError) when system_config.agents is missing or
system_config.agents.default.agent is empty, including the problematic
system_config (or its name/identifier) in the message so tests fail loudly and
point to the bad fixture; preserve the existing branch that reads name =
system_config.agents.default.agent and returns agent_def.enabled,
agent_def.endpoint_type when present.
In `@tests/unit/pipeline/evaluation/test_driver.py`:
- Around line 171-207: The two tests test_enabled_default_is_true and
test_close_is_noop both define an identical inner StubDriver class; refactor by
moving StubDriver out of the test methods into a shared scope (e.g., as a
class-level attribute on TestAgentDriverBase or a module-level helper) so both
tests instantiate the same StubDriver without duplication; ensure the moved
StubDriver implements execute_turn and validate_config exactly as in the current
inner definitions and update the tests to construct driver = StubDriver({}) and
call driver.enabled / driver.close() accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fb7bfdd5-39f1-4be6-b236-a466ec5155a4
📒 Files selected for processing (21)
src/lightspeed_evaluation/core/constants.pysrc/lightspeed_evaluation/core/models/__init__.pysrc/lightspeed_evaluation/core/models/agents.pysrc/lightspeed_evaluation/core/models/data.pysrc/lightspeed_evaluation/core/models/system.pysrc/lightspeed_evaluation/core/system/loader.pysrc/lightspeed_evaluation/pipeline/evaluation/__init__.pysrc/lightspeed_evaluation/pipeline/evaluation/driver.pysrc/lightspeed_evaluation/pipeline/evaluation/pipeline.pysrc/lightspeed_evaluation/pipeline/evaluation/processor.pytests/integration/system-config-agents-query.yamltests/integration/system-config-agents-streaming.yamltests/integration/test_full_evaluation.pytests/unit/core/models/test_agents.pytests/unit/core/models/test_data.pytests/unit/core/models/test_system.pytests/unit/core/system/test_loader.pytests/unit/pipeline/evaluation/conftest.pytests/unit/pipeline/evaluation/test_driver.pytests/unit/pipeline/evaluation/test_pipeline.pytests/unit/pipeline/evaluation/test_processor.py
f8bd187 to
d7d84e8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1)
173-190:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReset
_has_agent_amendmentsat the start of each run.The flag is only initialized in
__init__. If the sameEvaluationPipelineinstance is reused, one earlier run with an enabled driver leaves itTrue, and later runs will still try to persist amended data even when every effective driver is disabled.Possible fix
def run_evaluation( self, evaluation_data: list[EvaluationData], original_data_path: Optional[str] = None, ) -> list[EvaluationResult]: @@ """ self.original_data_path = original_data_path + self._has_agent_amendments = False logger.info("Starting evaluation")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py` around lines 173 - 190, Reset the instance-level flag self._has_agent_amendments at the start of each run to avoid carrying over state between runs; inside the method that initializes the run (the block that sets self.original_data_path, calls self.storage_backend.initialize(RunInfo(...)) and then calls self._process_eval_data), add a statement to set self._has_agent_amendments = False before processing so amended-data persistence only occurs for changes made during the current invocation.
🧹 Nitpick comments (1)
tests/integration/test_full_evaluation.py (1)
43-54: ⚡ Quick winReuse
resolve_agent_config()here to keep the test aligned with production resolution.This helper duplicates the default-agent lookup by reaching into
system_config.agents.agents[name]directly. Calling the same resolver the pipeline uses will make these tests follow future config-resolution changes automatically instead of drifting.Possible refactor
def get_agent_info(system_config: SystemConfig) -> tuple[bool, str]: @@ - if system_config.agents and system_config.agents.default.agent: - name = system_config.agents.default.agent - agent_def = system_config.agents.agents[name] - return agent_def.enabled, agent_def.endpoint_type + if system_config.agents and system_config.agents.default.agent: + _name, agent_config = system_config.agents.resolve_agent_config() + return ( + bool(agent_config.get("enabled", False)), + str(agent_config.get("endpoint_type", "query")), + ) return False, "query"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_full_evaluation.py` around lines 43 - 54, The helper get_agent_info duplicates config resolution; replace its manual lookup with the production resolver resolve_agent_config: if a default agent name exists (system_config.agents.default.agent) call resolve_agent_config(system_config, <default_name>) to obtain the resolved agent config and return its enabled and endpoint_type, and if resolve_agent_config returns no config fall back to returning (False, "query"); keep function signature get_agent_info(system_config: SystemConfig) and only change the body to use resolve_agent_config.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`:
- Line 45: The class-level pylint suppression on EvaluationPipeline must be
removed and the attribute fan-out reduced: create a small data holder (e.g.,
EvaluationContext or PipelineContext) to encapsulate related runtime state and
collaborators currently stored as separate instance attributes on
EvaluationPipeline, move those attributes into that context object, update
EvaluationPipeline.__init__ to accept and store one context instance (and adjust
any references inside methods to use self.context.<name>), and delete the "#
pylint: disable=too-many-instance-attributes" comment; ensure tests and usages
construct the new context when instantiating EvaluationPipeline.
- Around line 147-157: The code currently raises ConfigurationError when
self.system_config.agents is None even if a per-conversation override exists;
update the logic in the block that checks self.system_config.agents so that if
conv_data.agent_config (or conv_data.agent) is provided and self._registry can
build a driver from that override, you bypass resolve_agent_config and call
self._registry.create_driver(agent_config_override) directly, returning (driver,
True); only raise ConfigurationError when no agents are configured and there is
no per-conversation agent_config to construct a driver. Ensure you reference
system_config.agents, conv_data.agent_config, resolve_agent_config,
ConfigurationError, and _registry.create_driver in the change.
---
Outside diff comments:
In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`:
- Around line 173-190: Reset the instance-level flag self._has_agent_amendments
at the start of each run to avoid carrying over state between runs; inside the
method that initializes the run (the block that sets self.original_data_path,
calls self.storage_backend.initialize(RunInfo(...)) and then calls
self._process_eval_data), add a statement to set self._has_agent_amendments =
False before processing so amended-data persistence only occurs for changes made
during the current invocation.
---
Nitpick comments:
In `@tests/integration/test_full_evaluation.py`:
- Around line 43-54: The helper get_agent_info duplicates config resolution;
replace its manual lookup with the production resolver resolve_agent_config: if
a default agent name exists (system_config.agents.default.agent) call
resolve_agent_config(system_config, <default_name>) to obtain the resolved agent
config and return its enabled and endpoint_type, and if resolve_agent_config
returns no config fall back to returning (False, "query"); keep function
signature get_agent_info(system_config: SystemConfig) and only change the body
to use resolve_agent_config.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a840a7b3-29ba-43df-8487-0867d78897c0
📒 Files selected for processing (6)
src/lightspeed_evaluation/core/models/data.pysrc/lightspeed_evaluation/pipeline/evaluation/driver.pysrc/lightspeed_evaluation/pipeline/evaluation/pipeline.pytests/integration/test_full_evaluation.pytests/unit/pipeline/evaluation/test_driver.pytests/unit/pipeline/evaluation/test_pipeline.py
✅ Files skipped from review due to trivial changes (1)
- tests/unit/pipeline/evaluation/test_driver.py
🚧 Files skipped from review as they are similar to previous changes (3)
- src/lightspeed_evaluation/core/models/data.py
- src/lightspeed_evaluation/pipeline/evaluation/driver.py
- tests/unit/pipeline/evaluation/test_pipeline.py
|
|
||
|
|
||
| class EvaluationPipeline: | ||
| class EvaluationPipeline: # pylint: disable=too-many-instance-attributes |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Remove the new pylint suppression from EvaluationPipeline.
This adds a # pylint: disable=too-many-instance-attributes comment in src/, which the repo explicitly disallows. Please fix the attribute fan-out instead of suppressing the warning, e.g. by grouping runtime state/collaborators into a small context object.
As per coding guidelines "Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py` at line 45, The
class-level pylint suppression on EvaluationPipeline must be removed and the
attribute fan-out reduced: create a small data holder (e.g., EvaluationContext
or PipelineContext) to encapsulate related runtime state and collaborators
currently stored as separate instance attributes on EvaluationPipeline, move
those attributes into that context object, update EvaluationPipeline.__init__ to
accept and store one context instance (and adjust any references inside methods
to use self.context.<name>), and delete the "# pylint:
disable=too-many-instance-attributes" comment; ensure tests and usages construct
the new context when instantiating EvaluationPipeline.
| if self.system_config.agents is None: | ||
| raise ConfigurationError( | ||
| f"Conversation '{conv_data.conversation_group_id}' specifies " | ||
| f"agent overrides but no agents configuration is defined" | ||
| ) | ||
|
|
||
| _name, agent_config = self.system_config.agents.resolve_agent_config( | ||
| agent_name=conv_data.agent, | ||
| agent_config_override=conv_data.agent_config, | ||
| ) | ||
| return self._registry.create_driver(agent_config), True |
There was a problem hiding this comment.
Let inline agent_config overrides work without a top-level agents: section.
Right now any conversation with EvaluationData.agent_config fails with ConfigurationError if system_config.agents is missing, even though the registry can build a driver directly from that override. That breaks the highest-precedence per-conversation override path.
[suggested fix omitted? no, include diff]
Possible fix
- if self.system_config.agents is None:
+ if self.system_config.agents is None:
+ if conv_data.agent_config is not None:
+ return self._registry.create_driver(conv_data.agent_config), True
raise ConfigurationError(
f"Conversation '{conv_data.conversation_group_id}' specifies "
f"agent overrides but no agents configuration is defined"
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lightspeed_evaluation/pipeline/evaluation/pipeline.py` around lines 147 -
157, The code currently raises ConfigurationError when self.system_config.agents
is None even if a per-conversation override exists; update the logic in the
block that checks self.system_config.agents so that if conv_data.agent_config
(or conv_data.agent) is provided and self._registry can build a driver from that
override, you bypass resolve_agent_config and call
self._registry.create_driver(agent_config_override) directly, returning (driver,
True); only raise ConfigurationError when no agents are configured and there is
no per-conversation agent_config to construct a driver. Ensure you reference
system_config.agents, conv_data.agent_config, resolve_agent_config,
ConfigurationError, and _registry.create_driver in the change.
…gration Introduce a generic `agents:` configuration block in system.yaml that makes the HTTP API one agent type among potentially many (e.g., Kubernetes CRD agents for Openshift Agentic Lightspeed). The key design ensures zero breakage of existing configs and code: - New Pydantic models: AgentsConfig, AgentDefaultConfig, HttpApiAgentConfig in a new `core/models/agents.py` module - SystemConfig gains an `agents: Optional[AgentsConfig]` field alongside the existing `api:` field - A model_validator auto-migrates `api:` to `agents:` when `agents:` is absent, so all existing system.yaml files continue working unchanged - `api.enabled=True` maps to `default.agent="http_api"`; `api.enabled=False` maps to `default.agent=None` - The `api` field is preserved — all downstream code reading `config.api` continues to work without modification - EvaluationData gains `agent` and `agent_config` optional fields for per-conversation-group agent selection and config overrides - Config resolution follows a 3-level priority chain: eval_data.agent_config > agents.<name> > agents.default - ConfigLoader passes `agents` YAML data through to SystemConfig Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace direct APIDataAmender usage with abstract AgentDriver interface, enabling per-conversation agent resolution and support for multiple agent types. Pipeline owns driver lifecycle; processor receives driver per call. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace APIDataAmender/APIClient mocks with AgentDriver mocks in processor and pipeline tests. Add unit tests for AgentDriverRegistry, HttpApiDriver, and AgentDriver base class. All 113 evaluation tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add native agents: config format YAML files alongside legacy api: configs. Rename TestFullEvaluation to TestFullApiEvaluation and extend parametrization to cover both config formats (backward compat via migrate + native agents). Add get_agent_info() helper to resolve agent state from either format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d7d84e8 to
4c91ba8
Compare
Summary
Introduces an agent driver abstraction layer (
AgentDriverRegistry → AgentDriver(ABC) → HttpApiDriver) that replaces directAPIDataAmenderusage in the evaluation pipeline. This enables per-conversation agent resolution and lays the groundwork for supporting multiple agent types beyond HTTP API.AgentDriver(ABC)abstract base with statelessexecute_turn()pattern, andHttpApiDriverimplementation wrapping existingAPIDataAmenderAgentDriverRegistryfactory for driver creation by typeEvaluationData.agent/agent_configfieldsagents:format configs alongside legacyapi:configs, verifying both backward compatibility and the new config pathArchitecture
graph TD subgraph Pipeline EP[EvaluationPipeline] EP -->|creates| ADR[AgentDriverRegistry] EP -->|resolves config| DAC[_resolve_default_agent_config] EP -->|resolves per-conv| RDC[_resolve_driver_for_conversation] end subgraph "Driver Layer" ADR -->|create_driver| AD[AgentDriver ABC] AD -->|implements| HAD[HttpApiDriver] AD -.->|future| FD[Other Drivers...] HAD -->|wraps| ADA[APIDataAmender] end subgraph Processor CP[ConversationProcessor] CP -->|execute_turn| AD end EP -->|default driver shared| CP RDC -->|per-conv driver| CP style AD fill:#f9f,stroke:#333 style HAD fill:#bbf,stroke:#333 style FD fill:#ddd,stroke:#999,stroke-dasharray: 5 5Key design decisions
execute_turn()receivesconversation_idas parameter and returns it — no internal state, safe for thread sharingEvaluationPipeline(SRP — processor has zero config responsibility)Optionalchecks in the processorChanges
Production code
pipeline/evaluation/driver.pyAgentDriver(ABC),HttpApiDriver,AgentDriverRegistrypipeline/evaluation/pipeline.pyAPIClient/APIDataAmenderwith driver architecturepipeline/evaluation/processor.pyAgentDriverper-call instead of storedAPIDataAmenderpipeline/evaluation/__init__.pyTests
tests/unit/pipeline/evaluation/test_driver.pytests/unit/pipeline/evaluation/test_pipeline.pytests/unit/pipeline/evaluation/test_processor.pytests/unit/pipeline/evaluation/conftest.pytests/integration/test_full_evaluation.pyTestFullApiEvaluation, extended parametrization for both config formatstests/integration/system-config-agents-query.yamlagents:format configtests/integration/system-config-agents-streaming.yamlagents:format configTest plan
make black-format— all files formattedmake pre-commit— bandit, pyright, docstyle, ruff, pylint, black-check passmake test— 896 unit tests pass, 0 failurespytest tests/integration/ --collect-only— 10 integration tests collected🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Configuration
Tests