[TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix) by rutayan-nv · Pull Request #900 · NVIDIA/cloudai

rutayan-nv · 2026-05-26T23:10:59Z

Status: TDD red — drives a follow-up fix PR; do not merge alone

Stacked on #893 (`rpatro/step-ownership-tdd`); diff vs. `main` will look larger than the actual test addition until #893 lands. The test-only delta on top of #893 is +127 lines, one file (`tests/test_cloudaigym.py`).

This PR adds four tests pinning the contract that `CloudAIGymEnv`'s trajectory cache must include `env_params` in its key. Two tests FAIL today (bug-exposing), two PASS (back-compat sanity).

Bug

`CloudAIGymEnv.get_cached_trajectory_result(action)` keys the trajectory cache on `action` alone (`src/cloudai/configurator/cloudai_gym.py`, around L255-260). When a workload declares `env_params` (e.g. `drop_rate` for domain randomization) and the agent re-selects the same action under a different `env_params` sample, the cache returns the stale reward measured under the first sample. The agent silently trains on labels that do not correspond to the env they were nominally generated under.

Concrete evidence

Reproduced during the `mrc_prt_rl_ppo_dr` smoke test (2026-05-23):

`trajectory.csv`: 160 rows (10 PPO iter × 16 batch).
`env.csv`: 9 rows (one per cache miss — separate but related sparsity bug).
Reward is identical per action across all three `drop_rate` values — zero per-action reward variance attributable to `drop_rate` anywhere in 160 trials. The DR signal is statistically absent because every repeat of an action returns the cached reward sampled under whichever `drop_rate` first hit that action.

Tests added

Test	Today	After fix
`test_cache_miss_when_env_params_differ`	❌ FAIL	✅ PASS
`test_step_reruns_workload_when_env_params_change` (integration)	❌ FAIL	✅ PASS
`test_cache_hit_when_action_and_env_params_match`	✅ PASS	✅ PASS (control)
`test_cache_hit_when_neither_has_env_params`	✅ PASS	✅ PASS (back-compat)

The tests use `object.setattr` to attach `env_params` to the frozen `TrajectoryEntry` and to set `current_env_params` on `TestRun` — deliberately, so the contract test compiles before the schema change. A helper `_seed_cached_entry_with_env_params` localizes this; once the fix lands and `TrajectoryEntry.env_params` is a real field, the helper is dropped in favour of a constructor kwarg.

Follow-up fix PR (separate)

Add `env_params: dict[str, EnvParamSpec] = {}` to `TestDefinition`.
Add `current_env_params: dict[str, Any] = {}` to `TestRun`.
Add `env_params` to `TrajectoryEntry`.
Move `EnvParamsSampler` / `EnvParamsObserver` / `CsvSink` into cloudai (currently in cloudaix workloads).
Sample + sink env_params in `CloudAIGymEnv.step()` before the cache lookup.
Change `get_cached_trajectory_result` to key on `(action, current_env_params)`.

This makes `env.csv` ↔ `trajectory.csv` 1:1 alignment a structural invariant and unblocks any DR-driven downstream work across cloudai.

Why TDD-red as a separate PR

Forces upstream to engage with the contract before reviewing the implementation.
Captures the bug-exposing assertions verbatim so the fix's review can focus on the fix, not on whether the bug is real.
Carries no implementation risk (test-only diff).

- Agents that set HAS_CUSTOM_TRAINING_LOOP = True drive their own training loop; handle_dse_job calls agent.train() and skips the per-step env.step loop. - New _run_custom_training_loop helper logs exceptions, returns a process-style exit code, and always invokes agent.shutdown() (when defined) in a finally block so resources are released on both success and failure paths. - CustomTrainingLoopAgent Protocol documents the opt-in contract for type checkers and IDEs.

Pyright rejected calling _run_custom_training_loop(agent, ...) because the plain bool predicate did not narrow agent's static type from BaseAgent to CustomTrainingLoopAgent. Return TypeGuard[CustomTrainingLoopAgent] from _has_custom_training_loop so the truthy branch in handle_dse_job sees the opted-in shape and the helper can call agent.train() directly.

If agent.shutdown() raised from the finally block, Python suppressed the earlier return 0/1 from agent.train() and propagated the exception, breaking the outer test-run loop in handle_dse_job (skipped remaining scenarios, failed to accumulate err |= rc). Wrap shutdown() in its own try/except, log via logging.exception, set rc = 1, and return rc after finally so the helper always honours the (int) -> int contract. Adds tests for shutdown-only failure and combined train+shutdown failure.

- GymnasiumAdapter wraps a CloudAI BaseGym as a gymnasium-compatible env: Dict of Discrete actions over the tunable params, float32 Box observation derived from define_observation_space(), gymnasium 5-tuple from step/step_raw. - Fixed (single-value) params are injected on every step so agents only ever see the tunable subset; step_raw bypasses index decoding and fixed injection for callers that already have a fully-formed parameter dict. - The adapter mirrors handle_dse_job's 1-based test_run.step so artifact paths match whether the env is driven by the DSE loop or an external training loop. - CloudAIGymEnv.define_observation_space() now returns one slot per agent metric (at least one), giving adapters the correct Box shape. - gymnasium is an optional dependency: lazy-imported inside the adapter and exposed via a new rl extra (also added to dev) pinned to ~=1.2.

decode_action used to accept partial action dicts and step_raw accepted arbitrary key sets, both silently propagating incomplete params to the underlying env. Validate up front: - decode_action requires keys exactly == self._tunable_params, and each discrete index must be in range. - step_raw requires keys exactly == self._tunable_params | self._fixed_params. A new _assert_keys helper raises ValueError with sorted missing / extra lists so callers see exactly what was wrong. Adds tests for missing key, unknown key, out-of-range index, missing fixed param, and unknown raw key paths.

CloudAIGymEnv.get_cached_trajectory_result(action) keys the trajectory cache on action alone. When a workload declares env_params (e.g. drop_rate) and the agent re-selects the same action under a different env_params sample, the cache returns a stale reward measured under a different env, silently invalidating any domain-randomization workflow. Four tests pin the contract; today two FAIL (bug-exposing), two PASS (back-compat sanity). All 27 pre-existing tests are untouched. FAIL test_cache_miss_when_env_params_differ FAIL test_step_reruns_workload_when_env_params_change PASS test_cache_hit_when_action_and_env_params_match PASS test_cache_hit_when_neither_has_env_params Driver for the follow-up PR that adds env_params as a first-class field on TestDefinition + TrajectoryEntry and rewrites the cache key as (action, env_params). Both unit and integration shapes are covered so the fix can be validated end-to-end through env.step().

coderabbitai · 2026-05-26T23:11:05Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 3b523a9c-43a0-49dd-9af1-2de1f95420fa

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

rutayan-nv added 6 commits May 18, 2026 12:33

rutayan-nv mentioned this pull request May 26, 2026

fix(configurator): make env_params first-class to fix the trajectory cache key #901

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix)#900

[TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix)#900
rutayan-nv wants to merge 6 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/env-params-cache-key-tdd

rutayan-nv commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rutayan-nv commented May 26, 2026

Status: TDD red — drives a follow-up fix PR; do not merge alone

Bug

Concrete evidence

Tests added

Follow-up fix PR (separate)

Why TDD-red as a separate PR

Uh oh!

coderabbitai Bot commented May 26, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant