TDD Skill Evaluation Exercise

This repo tests one hypothesis: good skills and context engineering can help lower models perform closer to stronger models.

Before You Run

Read .codex/skills/tdd/SKILL.md to understand the TDD behavior being tested.
Read .codex/skills/eval-result/SKILL.md to understand how coder evidence is packaged.
Read .codex/agents/*.yaml to see which model each coder agent uses.
Read .codex/skills/skill-eval/references/rubric.md so you know how results are judged.

Prerequisites

Codex with agent/subagent support.
Git with a clean baseline commit.
Basic tooling for your chosen stack, for example JS/npm, Python, or Go.

Run

Ask Codex: Use $env-setup to prepare a TDD benchmark for <stack/problem>.
Answer the setup questions for stack, problem, commands, constraints, and acceptance criteria.
Confirm prompt and .codex/tdd-setup.json were created.
Commit or snapshot the clean baseline.
Ask Codex: Use $skill-eval to start eval.
Ask Codex: Use $skill-eval to show report.

$skill-eval runs junior, mid, and senior coder agents in isolated worktrees. Each coder uses $tdd and $eval-result; the evaluator scores outputs against the senior baseline.

To iterate, tune $tdd from report feedback, commit the skill change, throw away generated run state, and run the flow again.

Note: this is experimental and built for Codex. It has not been verified on other agent systems.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.codex		.codex
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TDD Skill Evaluation Exercise

Before You Run

Prerequisites

Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

TDD Skill Evaluation Exercise

Before You Run

Prerequisites

Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages