Skip to content

Add tune-params subcommand for PUCT hyperparameter tuning via QRS optimization#1178

Draft
ChinChangYang wants to merge 38 commits intolightvector:masterfrom
ChinChangYang:claude/katago-puct-tuning-guide-OXzst
Draft

Add tune-params subcommand for PUCT hyperparameter tuning via QRS optimization#1178
ChinChangYang wants to merge 38 commits intolightvector:masterfrom
ChinChangYang:claude/katago-puct-tuning-guide-OXzst

Conversation

@ChinChangYang
Copy link
Copy Markdown
Contributor

@ChinChangYang ChinChangYang commented Apr 1, 2026

Summary

Adds a tune-params subcommand for automated PUCT hyperparameter tuning using QRS (Quadratic Response Surface) optimization. The optimizer runs sequential head-to-head matches between a base bot and an experiment bot, fitting a quadratic logistic regression model to propose better parameter combinations over time.

Currently tunes cpuctExplorationLog by default, with the framework supporting additional PUCT dimensions.

Changes

New: tune-params subcommand (cpp/command/tuneparams.cpp)

  • Runs head-to-head matches between a fixed base bot (bot0) and an experiment bot (bot1)
  • After each game, feeds the outcome into QRSTuner::addResult()
  • Proposes the next parameter point via QRSTuner::nextSample() (MAP optimum + decaying Gaussian noise)
  • Reports progress at 10% intervals with ETA and 95% confidence intervals (delta method) for each tuned parameter
  • Prints best-found values and suggested match command to verify results
  • Graceful SIGINT/SIGTERM shutdown (matches pattern in match.cpp)

New: cpp/qrstune/QRSOptimizer.h + QRSOptimizer.cpp

  • QRSModel — Quadratic logistic regression with L2 regularization; Newton-Raphson MAP estimation; feature map is [1, x_i, x_i², x_i·x_j]
    • computeOptimumSE() — 95% CIs via delta method (implicit differentiation through MAP optimization)
    • hasConvexDim() — Detects noisy/unreliable dimensions and falls back to uniform sampling
  • QRSBuffer — Sample storage with confidence-based pruning: drops samples whose predicted win rate falls > 0.25 below the current MAP best, keeping at least 30 oldest samples for diversity
  • QRSTuner — Top-level interface: proposes next sample, records results, refits model periodically, prunes, decays exploration noise linearly (σ: 0.40 → 0.05)
  • 15 unit test suites covering feature computation, sigmoid, Gaussian elimination, 1D/2D model fitting, convergence, convex-dimension detection, Newton-Raphson stability, and convergence scaling (100/1000/10000 trials)

Enhanced: match command statistics (cpp/command/match.cpp)

  • Tracks pairwise W/L/D counts during the match loop
  • Prints Bradley-Terry MLE Elo (Newton-Raphson solver with convergence warning)
  • Wilson 95% CI on win rate per pairing
  • One-tailed p-value for whether the experiment bot's win rate exceeds 50%

New: cpp/core/elo.h + elo.cpp, cpp/core/fancymath.h + fancymath.cpp

  • Bradley-Terry Elo estimation and statistical functions extracted to core namespaces

New: cpp/configs/tune_params_example.cfg

Example config with 500 trials, maxVisits = 500, Tromp-Taylor rules with komi 7, and GTP-equivalent search defaults (useUncertainty, policyOptimism, useNoisePruning, useGraphSearch).

Documentation

  • README.md — Added command example for tune-params
  • cpp/README.md — Added qrstune/ to source folder summary and tuneparams.cpp to command list

Test plan

  • Unit tests pass (./katago runtests — 15 QRSTune test suites)
  • Run tune-params with example config on real model to verify end-to-end tuning loop
  • Run suggested match command to validate tuned parameters improve win rate

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

claude and others added 30 commits March 31, 2026 13:31
Introduce tune-params subcommand for sequential optimization of KataGo
PUCT parameters (cpuctExploration, cpuctExplorationLog,
cpuctUtilityStdevPrior) using QRS-Tune, a quadratic response surface
optimizer with logistic regression and confidence-based pruning.

Add match statistics output with Bradley-Terry Elo ratings, Wilson
confidence intervals, and pairwise win/loss/draw summaries.

New files:
- cpp/qrstune/QRSOptimizer.h: header-only QRS-Tune optimizer library
- cpp/command/tuneparams.cpp: tune-params subcommand implementation

Modified files:
- cpp/CMakeLists.txt: add tuneparams.cpp to build
- cpp/main.h, cpp/main.cpp: register tune-params subcommand
- cpp/command/match.cpp: add Elo/CI/p-value statistics after matches

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
The root-level build/ directory is used for out-of-source CMake builds.

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
- Add missing NeuralNet::globalCleanup() call before ScoreValue::freeTables()
  to properly clean up neural net backend state on exit
- Hoist bestWinRate computation out of per-dimension loop in
  printRegressionCurves() (value is invariant across dimensions)
- Remove unnecessary step-number comments that restated the code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename ALL_CAPS constants to camelCase (nDims, paramNames, plotW,
  plotH, qrsDefaultMins/Maxs, rangeMinKeys/MaxKeys, eloPerStrength)
- Change nullptr to NULL to match KataGo's dominant convention
- Change "// Comment" to "//Comment" (no space after //)
- Change "// --- Section ---" separators to "//Section" style
- Leave QRSOptimizer.h unchanged (standalone library, own namespace)

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
…stency

Add graceful SIGINT/SIGTERM shutdown to tuneparams matching the pattern
used by match.cpp and other long-running commands. Fix QRSBuffer::prune
to retain highest-quality samples rather than oldest insertion-order ones
when applying min_keep. Add missing inline on gaussianSolve in header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add convergence detection to computeBradleyTerryElo in match.cpp so
that a warning is logged when the Newton-Raphson solver hits the 200
iteration limit without converging. Change QRSOptimizer.h free functions
from static inline to inline for correct weak external linkage in the
namespaced header-only library.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The header-only design violated KataGo's convention of separating
declarations (.h) from implementations (.cpp). Move all non-trivial
function bodies to QRSOptimizer.cpp, replace #pragma once with
#ifndef guard, trim header includes, and have predict() delegate
to score() to eliminate duplicated logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build directory moved under cpp/build which is already gitignored.

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
Move wilsonCI95 and oneTailedPValue from static functions in match.cpp
to FancyMath namespace in core/fancymath.h/.cpp, following KataGo's
pattern of placing reusable math utilities in core namespaces.

Move computeBradleyTerryElo from static function in match.cpp to
ComputeElos namespace in core/elo.h/.cpp, alongside the existing
Elo computation utilities.

match.cpp now calls FancyMath::wilsonCI95(), FancyMath::oneTailedPValue(),
and ComputeElos::computeBradleyTerryElo() instead of file-local statics.

tuneparams.cpp static functions (qrsDimToReal, qrsToPUCT,
printRegressionCurves) are kept as file-local statics since they are
command-specific helpers, matching KataGo's pattern for command files.

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
- Use existing ELO_PER_LOG_GAMMA constant instead of recomputing
- Hoist Newton-loop allocations in computeBradleyTerryElo (grad, H, aug, delta)
- Hoist Newton-loop allocations in QRSModel::fit (grad, negH)
- Remove dead sigReceived state in tuneparams.cpp
- Add n <= 0 guard to FancyMath::wilsonCI95 to prevent division by zero
- Use std::move in QRSBuffer::prune for kept sample vectors
- Fix cpp/README.md: qrstune is no longer header-only, fix algorithm name

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
Readability:
- Add file-level comment explaining the QRS-Tune algorithm
- Document feature layout with example (D=2: [1, a, b, a^2, b^2, a*b])
- Name magic numbers: SINGULAR_THRESHOLD, CONVERGENCE_THRESHOLD, SIGMOID_CLAMP
- Rename shadow variable 'f' to 'mult' in gaussianSolve
- Rename terse variables: z->logit, w->hessianWeight, resid->residual,
  maxd->maxStep, b_lin->linearCoeffs, b_quad->quadCoeffs, b_cross->crossCoeffs,
  p_best->bestPrediction, kv->entry, nx/ny->newXs/newYs
- Add phase comments in fit() documenting Newton-Raphson steps
- Add algorithm-level comment above fit() explaining the objective function

Tests (8 test cases):
- numFeatures: verify D=0,1,2,3
- computeFeatures: verify feature vector for D=2
- sigmoid: boundary, midpoint, and clamp behavior
- gaussianSolve: 2x2 system, 3x3 identity, singular detection
- QRSModel fit+predict: 1D and 2D separable data
- QRSModel mapOptimum: optimum better than anti-optimum
- QRSTuner end-to-end: 100 trials with deterministic seed
- QRSBuffer prune: verify buffer size reduction

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
Replace the hardcoded every-100-trials progress log with reporting
at every 10% of total trials. Each progress line now includes
percentage, win/loss/draw counts, best parameters, and estimated
time of arrival.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ASCII regression charts and est.winrate displayed saturated sigmoid
values (always 1.0 or 0.0) due to aggressive buffer pruning inflating
model coefficients, confusing users who saw ~50% actual win rates.

Replace charts with 95% confidence intervals computed via the delta
method (computeOptimumSE), shown in both progress reports and final
results. Extract shared helpers (buildNegHessian, buildQuadHessian,
computeParamCIs) to eliminate duplicated matrix construction logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use paramShortNames consistently in both CI and fallback branches,
eliminating hardcoded parameter names and unused qrsToPUCT call.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tune-params runs one game per trial sequentially because QRS-Tune
needs each outcome before choosing the next sample point, so
numGameThreads is never read and was misleading.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Fisher information matrix only depends on xs and current beta, not
on the observed outcomes ys. Remove the misleading parameter and fix
the doc comment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After printing optimal PUCT parameters, tune-params now outputs a
ready-to-use match command so users can quickly verify the tuned
parameters against defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies QRS-Tune can locate a shifted peak in a stochastic 1D quadratic
landscape (true optimum at x*=0.35), complementing the existing step-function
end-to-end test which only checks x>0 separation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert QRSBuffer::prune to retain oldest min_keep_ samples (insertion
order) instead of best-scoring ones. The sort-based pruning removed
spatially diverse early exploration samples, enabling a feedback loop
where the model drifts to boundary values on flat landscapes.

Also tighten the clamped detection in computeOptimumSE to catch
near-boundary optima (within 1e-9) and cache predictions in prune
to avoid redundant predict() calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The charts were removed in 4cdf5bc because saturated sigmoid values
made them misleading. Bring them back so users can visually inspect the
fitted win-rate surface for each PUCT dimension after tuning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix computeOptimumSE producing wild CIs for boundary-clamped dims by
returning early when convex and zeroing Jacobian rows for clamped dims.
Skip pruning when model has convex dimensions to prevent discarding
good samples based on unreliable predictions. Add optional verbose
logging (config key 'verbose') that logs refit diagnostics, pruning
events, and per-trial sample coordinates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cpuctExploration and cpuctExplorationLog are strongly correlated in the
PUCT formula (both additively scale exploration), making the optimizer
unable to distinguish their individual effects. Dropping cpuctExploration
reduces dimensionality from 3 to 2, improving optimizer convergence.
The experiment bot's cpuctExploration now stays fixed at the config value.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…faults in tune config

The QRS tuner's Newton-Raphson can diverge when warm-starting from
saturated beta values: the Hessian degenerates to just the L2 prior,
causing oscillating overshoots that grow across refits. The new test
reproduces this by running a flat 2D landscape and scanning seeds until
one triggers |intercept| > 50 (observed values: 400-500 vs < 1 normal).

Also update tune_params_example.cfg to use default parameter ranges and
enable GTP-equivalent search features so tuning reflects actual play.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce QRS-Tune optimization from 2D to 1D, tuning only
cpuctExplorationLog. Also fix intercept divergence in QRSModel::fit
by resetting beta to prior mean before each IRLS iteration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defer the QRS sample log message from nextSample() to addResult(), so the
experiment outcome (exp wins / exp loses / draw) can be appended to the same
line.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChinChangYang and others added 8 commits April 4, 2026 06:40
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…arams

Expand tune-params from 1 to 3 tuned dimensions using a data-driven
TuneDimension struct with pointer-to-member fields, so adding future
parameters requires only a single table entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Y now indicates the normal case (proper concave maximum found),
matching user expectation that Y = good and N = problematic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop cpuctUtilityStdevPriorWeight from the tuned dimensions — its peak
effect on explore scaling is ~4% (at ~15-visit nodes), far too small to
detect in 1000 binary-outcome games, and its persistently convex
quadratic coefficient prevents QRS convergence. This reduces the model
from 10 parameters (3D) to 6 (2D), stabilizing the quadratic fit.

Also soften buffer pruning to never remove more than half the buffer in
one pass, avoiding sudden data cliffs that destabilize the fit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makes tune-params verbose output show whether the experiment bot
played as black or white, aiding diagnosis of color-dependent effects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1000-trial tuning run showed cpuctUtilityStdevPrior has a convex
(monotonically increasing) response with no optimum in range.
A 200-game match confirmed stdev-scaling (0.85 vs 0.0) produces
no significant difference (p=0.69) with the b6c64 model at 4kv.
The parameter wastes half the tuning budget and causes the QRS
optimizer to fall back to non-concave mode.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…000 trials

Verify that QRSTuner converges to the true optimum of a shared 2D quadratic
landscape with progressively tighter accuracy as trial budget increases.
Asserts distance-to-optimum, win probability, monotonic SE shrinkage, and
that the true optimum falls within each run's 95% confidence interval.

Also lower cpuctExplorationLogMax from 10.0 to 5.0 in tune_params_example.cfg.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the fitted quadratic surface has convex dimensions (no reliable peak),
sample uniformly across [-1,+1]^D instead of clustering around the origin
via N(0, sigma_initial). This avoids center-biased exploration on flat or
noise-dominated landscapes where the MAP optimum is meaningless.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChinChangYang ChinChangYang changed the title Add QRS-Tune PUCT hyperparameter tuning and match statistics Add tune-params subcommand for PUCT hyperparameter tuning via QRS optimization Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants