[ab-advisor] Experiment campaign for go-logger: A/B test max_turns

### 🧪 Experiment Campaign: go-logger

**Workflow file**: `.github/workflows/go-logger.md`
**Selected dimension**: `max_turns`
**Triggered by**: `ab-testing-advisor` on 2026-06-04

---

### Background

The `go-logger` workflow analyzes Go source files in `pkg/` and adds meaningful debug logging statements, capping each run at 5 files per PR. Its dense prompt involves multi-step discovery, per-file analysis, editing, build validation, and PR creation — a workflow with substantial sequential depth. Testing `max_turns` is warranted because the current `timeout-minutes: 15` cap may be masking turn exhaustion on complex files, or conversely, generous turn budgets may allow the agent to over-deliberate on simple logging decisions, wasting tokens without quality gains.

### Hypothesis

**H0 (null)**: Changing the maximum turn budget does not affect the number of files successfully logged per run, output quality (checklist compliance), or token cost.

**H1 (alternative)**: A tighter turn budget (`conservative`) reduces token cost by 20–30% with minimal quality loss on straightforward files, while a generous budget (`generous`) improves checklist compliance on complex files but at higher cost.

### Experiment Configuration

Add the following `experiments:` block to the workflow frontmatter:

```yaml
experiments:
  max_turns_budget:
    variants: [conservative, standard, generous]
    description: "Tests whether the agent turn budget affects logging quality and token cost. Conservative caps deliberation early; generous allows deeper analysis of complex files."
    hypothesis: "H0: no change in files-logged-per-run or checklist compliance across turn budgets. H1: conservative reduces cost 20-30% with <5% quality loss; generous improves complex-file compliance by >10%."
    metric: files_successfully_logged_per_run
    secondary_metrics: [token_cost_per_file, checklist_compliance_rate, run_duration_ms]
    guardrail_metrics:
      - name: empty_pr_rate
        direction: min
        threshold: 0.10
      - name: build_failure_rate
        direction: min
        threshold: 0.05
    min_samples: 15
    weight: [33, 34, 33]
    start_date: "2026-06-04"
    analysis_type: mann_whitney
    tags: [cost-efficiency, agent-turns, go-logging]
    notify:
      discussion: 0
    issue: 0
```

**Variant descriptions**:
- `conservative`: Sets `max_turns: 20` — tight budget forcing the agent to commit to logging decisions quickly without excessive deliberation. Expected to reduce token usage by 20–30% on simple files.
- `standard`: Sets `max_turns: 35` — baseline matching current observed behavior given the 15-minute timeout and typical file complexity.
- `generous`: Sets `max_turns: 55` — expanded budget allowing deeper analysis of complex multi-function files, expected to improve checklist compliance but at higher token cost.

### Workflow Changes Required

The `max_turns` field must be added to the `agent:` configuration block in the compiled lock file. The experiment controls this via conditional blocks in the workflow markdown.

**Before** (no `max_turns` configured, runtime default applies):

```yaml
engine: claude
timeout-minutes: 15
```

**After** (add to frontmatter, inside the agent prompt region):

```yaml
engine: claude
timeout-minutes: 15
{{#if experiments.max_turns_budget == "conservative" }}
max_turns: 20
{{else if experiments.max_turns_budget == "generous" }}
max_turns: 55
{{else}}
max_turns: 35
{{/if}}
```

<details><summary>View full frontmatter diff</summary>

```diff
  engine: claude
  name: Go Logger Enhancement
  timeout-minutes: 15
+ {{#if experiments.max_turns_budget == "conservative" }}
+ max_turns: 20
+ {{else if experiments.max_turns_budget == "generous" }}
+ max_turns: 55
+ {{else}}
+ max_turns: 35
+ {{/if}}
  tools:
```

</details>

### Success Metrics

| Metric | Type | Target |
|--------|------|--------|
| files_successfully_logged_per_run | Primary | ≥ 4 of 5 files across all variants |
| token_cost_per_file | Secondary | `conservative` ≤ 80% of `standard` cost |
| checklist_compliance_rate | Secondary | `generous` ≥ 105% of `standard` |
| run_duration_ms | Secondary | Signal only — no target |
| empty_pr_rate | Guardrail | Must stay < 10% per variant |
| build_failure_rate | Guardrail | Must stay < 5% per variant |

### Statistical Design

- **Variants**: `conservative`, `standard`, `generous`
- **Assignment**: Round-robin via `gh-aw` experiments runtime (cache-based)
- **Minimum runs per variant**: 15 (45 total runs across all variants)
- **Expected experiment duration**: ~45 days (daily schedule, 1 run/day, 3-way split)
- **Analysis approach**: Mann-Whitney U test (non-parametric; run counts and durations are not normally distributed)

### Implementation Steps

- [ ] Add `experiments:` section to frontmatter (YAML above)
- [ ] Add `{{#if experiments.max_turns_budget == "<variant>" }}` conditional blocks around `max_turns:` in the workflow
- [ ] Run `gh aw compile go-logger` to regenerate the lock file
- [ ] Monitor experiment artifact uploaded per run to `/tmp/gh-aw/agent/experiments/state.json`
- [ ] After 45+ runs (15 per variant), analyze variant distribution via workflow run artifacts
- [ ] Document findings and promote winning variant; update `max_turns` to winning value and remove experiment block

### Infrastructure Note

> ✅ The `analysis_type`, `tags`, and `notify` experiment schema fields are **fully implemented** end-to-end in both `pkg/workflow/compiler_experiments.go` and `actions/setup/js/pick_experiment.cjs`. No infrastructure sub-issue is required.

### References

- [A/B Testing in gh-aw](https://github.com/github/gh-aw/blob/main/.github/aw/github-agentic-workflows.md)
- Workflow file: `.github/workflows/go-logger.md`







> Generated by [🧪 Daily A/B Testing Advisor](https://github.com/github/gh-aw/actions/runs/26933122186) · sonnet46 1.5M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fab-testing-advisor%22&type=issues)
> - [x] expires  on Jun 18, 2026, 5:45 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ab-advisor] Experiment campaign for go-logger: A/B test max_turns #36847

🧪 Experiment Campaign: go-logger

Background

Hypothesis

Experiment Configuration

Workflow Changes Required

Success Metrics

Statistical Design

Implementation Steps

Infrastructure Note

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Metric	Type	Target
files_successfully_logged_per_run	Primary	≥ 4 of 5 files across all variants
token_cost_per_file	Secondary	`conservative` ≤ 80% of `standard` cost
checklist_compliance_rate	Secondary	`generous` ≥ 105% of `standard`
run_duration_ms	Secondary	Signal only — no target
empty_pr_rate	Guardrail	Must stay < 10% per variant
build_failure_rate	Guardrail	Must stay < 5% per variant

[ab-advisor] Experiment campaign for go-logger: A/B test max_turns #36847

Description

🧪 Experiment Campaign: go-logger

Background

Hypothesis

Experiment Configuration

Workflow Changes Required

Success Metrics

Statistical Design

Implementation Steps

Infrastructure Note

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions