fix(configurator): make env_params first-class to fix the trajectory cache key by rutayan-nv · Pull Request #901 · NVIDIA/cloudai

rutayan-nv · 2026-05-26T23:22:58Z

Summary

Turns the four TDD-red tests in #900 green and makes the underlying
design change required to fix domain randomization in cloudai.

Today, CloudAIGymEnv.get_cached_trajectory_result() keys the cache on
action only. An agent revisiting the same action under a different
env_params sample (e.g. drop_rate) gets the first trial's reward
served back from cache, silently corrupting the reward signal. The
parallel artifact (env.csv) lived in workload code and was skipped on
cache hits, so it was sparse and mis-aligned with trajectory.csv.

This PR moves env_params from a workload-private convention into
first-class cloudai schema so the same fix benefits every agent (PPO,
BO, GA, MAB) and every workload uniformly:

TestDefinition.env_params: dict[str, EnvParamSpec] — sibling to
cmd_args; not part of the agent's action space.
TestRun.current_env_params: dict[str, Any] — per-trial sample
attached by the env, visible to the cache key and to downstream
sinks.
TrajectoryEntry gains env_params. The cache key is now
(action, current_env_params).
CloudAIGymEnv builds an EnvParamsObserver whenever the
TestDefinition declares env_params. before_step() samples
deterministically by (seed, name, trial), stashes the sample on
test_run, and appends to env.csv before the cache lookup — so
env.csv is written on every trial (cache hit or miss) and aligns
1:1 with trajectory.csv on step.
Workloads without any [env_params.*] block pay zero overhead: no
observer, no env.csv, identical behaviour to today.

A follow-up PR in cloudaix will delete the now-redundant
env_params.py / env_params_sampler.py / env_params_sink.py
modules in domain_randomization/workloads/mrc_prt/ and route the
demo workload through the cloudai-owned schema.

Test plan

The 4 TDD-red tests added in [TDD-red] CloudAIGymEnv cache key must include env_params (drives domain-randomization correctness fix) #900 (test_cache_miss_when_env_params_differ,
test_step_reruns_workload_when_env_params_change, plus two control
tests) now pass.
9 new unit tests in tests/test_env_params.py cover EnvParamSpec
validation, EnvParamsSampler determinism + per-parameter
independence, CsvSink format + edge cases, and
EnvParamsObserver wiring.
2 new integration tests in tests/test_cloudaigym.py pin the
env.csv ↔ trajectory.csv step-alignment contract and the
zero-overhead claim for workloads without env_params.
Full cloudai test suite: 1479 passed, 4 skipped.
ruff check / ruff format / pyright clean on all touched files.

Stacking

Depends on #900 (the TDD-red tests). Either rebase #900 onto this PR
after merge, or merge them together. Both are stacked on #893
(rpatro/step-ownership-tdd).

- Agents that set HAS_CUSTOM_TRAINING_LOOP = True drive their own training loop; handle_dse_job calls agent.train() and skips the per-step env.step loop. - New _run_custom_training_loop helper logs exceptions, returns a process-style exit code, and always invokes agent.shutdown() (when defined) in a finally block so resources are released on both success and failure paths. - CustomTrainingLoopAgent Protocol documents the opt-in contract for type checkers and IDEs.

Pyright rejected calling _run_custom_training_loop(agent, ...) because the plain bool predicate did not narrow agent's static type from BaseAgent to CustomTrainingLoopAgent. Return TypeGuard[CustomTrainingLoopAgent] from _has_custom_training_loop so the truthy branch in handle_dse_job sees the opted-in shape and the helper can call agent.train() directly.

If agent.shutdown() raised from the finally block, Python suppressed the earlier return 0/1 from agent.train() and propagated the exception, breaking the outer test-run loop in handle_dse_job (skipped remaining scenarios, failed to accumulate err |= rc). Wrap shutdown() in its own try/except, log via logging.exception, set rc = 1, and return rc after finally so the helper always honours the (int) -> int contract. Adds tests for shutdown-only failure and combined train+shutdown failure.

- GymnasiumAdapter wraps a CloudAI BaseGym as a gymnasium-compatible env: Dict of Discrete actions over the tunable params, float32 Box observation derived from define_observation_space(), gymnasium 5-tuple from step/step_raw. - Fixed (single-value) params are injected on every step so agents only ever see the tunable subset; step_raw bypasses index decoding and fixed injection for callers that already have a fully-formed parameter dict. - The adapter mirrors handle_dse_job's 1-based test_run.step so artifact paths match whether the env is driven by the DSE loop or an external training loop. - CloudAIGymEnv.define_observation_space() now returns one slot per agent metric (at least one), giving adapters the correct Box shape. - gymnasium is an optional dependency: lazy-imported inside the adapter and exposed via a new rl extra (also added to dev) pinned to ~=1.2.

decode_action used to accept partial action dicts and step_raw accepted arbitrary key sets, both silently propagating incomplete params to the underlying env. Validate up front: - decode_action requires keys exactly == self._tunable_params, and each discrete index must be in range. - step_raw requires keys exactly == self._tunable_params | self._fixed_params. A new _assert_keys helper raises ValueError with sorted missing / extra lists so callers see exactly what was wrong. Adds tests for missing key, unknown key, out-of-range index, missing fixed param, and unknown raw key paths.

CloudAIGymEnv.get_cached_trajectory_result(action) keys the trajectory cache on action alone. When a workload declares env_params (e.g. drop_rate) and the agent re-selects the same action under a different env_params sample, the cache returns a stale reward measured under a different env, silently invalidating any domain-randomization workflow. Four tests pin the contract; today two FAIL (bug-exposing), two PASS (back-compat sanity). All 27 pre-existing tests are untouched. FAIL test_cache_miss_when_env_params_differ FAIL test_step_reruns_workload_when_env_params_change PASS test_cache_hit_when_action_and_env_params_match PASS test_cache_hit_when_neither_has_env_params Driver for the follow-up PR that adds env_params as a first-class field on TestDefinition + TrajectoryEntry and rewrites the cache key as (action, env_params). Both unit and integration shapes are covered so the fix can be validated end-to-end through env.step().

Domain randomization broke under the trajectory cache because the cache key was action-only: an agent revisiting the same action under a different env_params sample (e.g. drop_rate) got the *first* trial's reward served back, silently corrupting the reward signal. env.csv was also workload-owned and skipped on cache hits, so it was sparse and mis-aligned with trajectory.csv. This change makes env_params first-class in cloudai so the design works the same way for every agent (PPO, BO, GA, MAB) and every workload: - TestDefinition.env_params: dict[str, EnvParamSpec] - sibling to cmd_args, not part of the action space. - TestRun.current_env_params: dict[str, Any] - per-trial sample attached by the env so the cache key and downstream sinks see it. - TrajectoryEntry gains env_params; the cache key is now (action, current_env_params). - CloudAIGymEnv builds an EnvParamsObserver when env_params is declared. before_step() samples deterministically by (seed, name, trial), stashes the sample on test_run, and appends to env.csv *before* the cache lookup - so env.csv is written on every trial (cache hit or miss) and aligns 1:1 with trajectory.csv by step. - Workloads with no [env_params.*] block pay zero overhead: no observer, no env.csv, identical behaviour to today. Tests: turns the four TDD-red tests from the parent PR green and adds nine unit tests for EnvParamSpec / EnvParamsSampler / CsvSink / EnvParamsObserver plus two integration tests pinning the env.csv ↔ trajectory.csv contract.

coderabbitai · 2026-05-26T23:23:03Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: c6be9121-486e-4617-a614-9d17846b1ca5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Closes the most important gap in the env_params test matrix: the cache-HIT path under an observer-driven TestDefinition.env_params. The previous tests covered each axis separately - cache key unit tests for env_params, a step()-level cache-miss integration with manually set current_env_params, and a step-alignment test where all trials were cache misses. None of them proved that a cache HIT still drives the observer (and therefore env.csv) - which is precisely the sparsity contract the fix was about. The new test: - declares env_params on the TestDefinition (real observer wired in), - pre-seeds the trajectory with an entry whose action AND deterministic env_params sample match what the observer will produce for step 1, - calls env.step() once and asserts: (a) runner.run is NOT called (cache short-circuit honored), (b) env.csv gains a row (observer fired before the cache lookup), (c) the new trajectory entry carries the per-trial env_params. This is the end-to-end proof that the env.csv sparsity bug cannot recur, and that env.csv ↔ trajectory.csv stay step-aligned on the cache-hit path too.

rutayan-nv added 7 commits May 18, 2026 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(configurator): make env_params first-class to fix the trajectory cache key#901

fix(configurator): make env_params first-class to fix the trajectory cache key#901
rutayan-nv wants to merge 8 commits into
NVIDIA:mainfrom
rutayan-nv:rpatro/env-params-first-class

rutayan-nv commented May 26, 2026

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rutayan-nv commented May 26, 2026

Summary

Test plan

Stacking

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 26, 2026 •

edited

Loading