Add tune-params subcommand for PUCT hyperparameter tuning via QRS optimization by ChinChangYang · Pull Request #1178 · lightvector/KataGo

ChinChangYang · 2026-04-01T00:27:13Z

Summary

Adds a tune-params subcommand for automated PUCT hyperparameter tuning using QRS (Quadratic Response Surface) optimization. The optimizer runs sequential head-to-head matches between a base bot and an experiment bot, fitting a quadratic logistic regression model to propose better parameter combinations over time.

Currently tunes cpuctExplorationLog by default, with the framework supporting additional PUCT dimensions.

Changes

New: `tune-params` subcommand (`cpp/command/tuneparams.cpp`)

Runs head-to-head matches between a fixed base bot (bot0) and an experiment bot (bot1)
After each game, feeds the outcome into QRSTuner::addResult()
Proposes the next parameter point via QRSTuner::nextSample() (MAP optimum + decaying Gaussian noise)
Reports progress at 10% intervals with ETA and 95% confidence intervals (delta method) for each tuned parameter
Prints best-found values and suggested match command to verify results
Graceful SIGINT/SIGTERM shutdown (matches pattern in match.cpp)

New: `cpp/qrstune/QRSOptimizer.h` + `QRSOptimizer.cpp`

QRSModel — Quadratic logistic regression with L2 regularization; Newton-Raphson MAP estimation; feature map is [1, x_i, x_i², x_i·x_j]
- computeOptimumSE() — 95% CIs via delta method (implicit differentiation through MAP optimization)
- hasConvexDim() — Detects noisy/unreliable dimensions and falls back to uniform sampling
QRSBuffer — Sample storage with confidence-based pruning: drops samples whose predicted win rate falls > 0.25 below the current MAP best, keeping at least 30 oldest samples for diversity
QRSTuner — Top-level interface: proposes next sample, records results, refits model periodically, prunes, decays exploration noise linearly (σ: 0.40 → 0.05)
15 unit test suites covering feature computation, sigmoid, Gaussian elimination, 1D/2D model fitting, convergence, convex-dimension detection, Newton-Raphson stability, and convergence scaling (100/1000/10000 trials)

Enhanced: `match` command statistics (`cpp/command/match.cpp`)

Tracks pairwise W/L/D counts during the match loop
Prints Bradley-Terry MLE Elo (Newton-Raphson solver with convergence warning)
Wilson 95% CI on win rate per pairing
One-tailed p-value for whether the experiment bot's win rate exceeds 50%

New: `cpp/core/elo.h` + `elo.cpp`, `cpp/core/fancymath.h` + `fancymath.cpp`

Bradley-Terry Elo estimation and statistical functions extracted to core namespaces

New: `cpp/configs/tune_params_example.cfg`

Example config with 500 trials, maxVisits = 500, Tromp-Taylor rules with komi 7, and GTP-equivalent search defaults (useUncertainty, policyOptimism, useNoisePruning, useGraphSearch).

Documentation

README.md — Added command example for tune-params
cpp/README.md — Added qrstune/ to source folder summary and tuneparams.cpp to command list

Test plan

Unit tests pass (./katago runtests — 15 QRSTune test suites)
Run tune-params with example config on real model to verify end-to-end tuning loop
Run suggested match command to validate tuned parameters improve win rate

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Introduce tune-params subcommand for sequential optimization of KataGo PUCT parameters (cpuctExploration, cpuctExplorationLog, cpuctUtilityStdevPrior) using QRS-Tune, a quadratic response surface optimizer with logistic regression and confidence-based pruning. Add match statistics output with Bradley-Terry Elo ratings, Wilson confidence intervals, and pairwise win/loss/draw summaries. New files: - cpp/qrstune/QRSOptimizer.h: header-only QRS-Tune optimizer library - cpp/command/tuneparams.cpp: tune-params subcommand implementation Modified files: - cpp/CMakeLists.txt: add tuneparams.cpp to build - cpp/main.h, cpp/main.cpp: register tune-params subcommand - cpp/command/match.cpp: add Elo/CI/p-value statistics after matches https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

The root-level build/ directory is used for out-of-source CMake builds. https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

- Add missing NeuralNet::globalCleanup() call before ScoreValue::freeTables() to properly clean up neural net backend state on exit - Hoist bestWinRate computation out of per-dimension loop in printRegressionCurves() (value is invariant across dimensions) - Remove unnecessary step-number comments that restated the code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Rename ALL_CAPS constants to camelCase (nDims, paramNames, plotW, plotH, qrsDefaultMins/Maxs, rangeMinKeys/MaxKeys, eloPerStrength) - Change nullptr to NULL to match KataGo's dominant convention - Change "// Comment" to "//Comment" (no space after //) - Change "// --- Section ---" separators to "//Section" style - Leave QRSOptimizer.h unchanged (standalone library, own namespace) https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

…stency Add graceful SIGINT/SIGTERM shutdown to tuneparams matching the pattern used by match.cpp and other long-running commands. Fix QRSBuffer::prune to retain highest-quality samples rather than oldest insertion-order ones when applying min_keep. Add missing inline on gaussianSolve in header. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add convergence detection to computeBradleyTerryElo in match.cpp so that a warning is logged when the Newton-Raphson solver hits the 200 iteration limit without converging. Change QRSOptimizer.h free functions from static inline to inline for correct weak external linkage in the namespaced header-only library. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The header-only design violated KataGo's convention of separating declarations (.h) from implementations (.cpp). Move all non-trivial function bodies to QRSOptimizer.cpp, replace #pragma once with #ifndef guard, trim header includes, and have predict() delegate to score() to eliminate duplicated logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

Build directory moved under cpp/build which is already gitignored. https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

Move wilsonCI95 and oneTailedPValue from static functions in match.cpp to FancyMath namespace in core/fancymath.h/.cpp, following KataGo's pattern of placing reusable math utilities in core namespaces. Move computeBradleyTerryElo from static function in match.cpp to ComputeElos namespace in core/elo.h/.cpp, alongside the existing Elo computation utilities. match.cpp now calls FancyMath::wilsonCI95(), FancyMath::oneTailedPValue(), and ComputeElos::computeBradleyTerryElo() instead of file-local statics. tuneparams.cpp static functions (qrsDimToReal, qrsToPUCT, printRegressionCurves) are kept as file-local statics since they are command-specific helpers, matching KataGo's pattern for command files. https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

- Use existing ELO_PER_LOG_GAMMA constant instead of recomputing - Hoist Newton-loop allocations in computeBradleyTerryElo (grad, H, aug, delta) - Hoist Newton-loop allocations in QRSModel::fit (grad, negH) - Remove dead sigReceived state in tuneparams.cpp - Add n <= 0 guard to FancyMath::wilsonCI95 to prevent division by zero - Use std::move in QRSBuffer::prune for kept sample vectors - Fix cpp/README.md: qrstune is no longer header-only, fix algorithm name https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

Readability: - Add file-level comment explaining the QRS-Tune algorithm - Document feature layout with example (D=2: [1, a, b, a^2, b^2, a*b]) - Name magic numbers: SINGULAR_THRESHOLD, CONVERGENCE_THRESHOLD, SIGMOID_CLAMP - Rename shadow variable 'f' to 'mult' in gaussianSolve - Rename terse variables: z->logit, w->hessianWeight, resid->residual, maxd->maxStep, b_lin->linearCoeffs, b_quad->quadCoeffs, b_cross->crossCoeffs, p_best->bestPrediction, kv->entry, nx/ny->newXs/newYs - Add phase comments in fit() documenting Newton-Raphson steps - Add algorithm-level comment above fit() explaining the objective function Tests (8 test cases): - numFeatures: verify D=0,1,2,3 - computeFeatures: verify feature vector for D=2 - sigmoid: boundary, midpoint, and clamp behavior - gaussianSolve: 2x2 system, 3x3 identity, singular detection - QRSModel fit+predict: 1D and 2D separable data - QRSModel mapOptimum: optimum better than anti-optimum - QRSTuner end-to-end: 100 trials with deterministic seed - QRSBuffer prune: verify buffer size reduction https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

Replace the hardcoded every-100-trials progress log with reporting at every 10% of total trials. Each progress line now includes percentage, win/loss/draw counts, best parameters, and estimated time of arrival. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The ASCII regression charts and est.winrate displayed saturated sigmoid values (always 1.0 or 0.0) due to aggressive buffer pruning inflating model coefficients, confusing users who saw ~50% actual win rates. Replace charts with 95% confidence intervals computed via the delta method (computeOptimumSE), shown in both progress reports and final results. Extract shared helpers (buildNegHessian, buildQuadHessian, computeParamCIs) to eliminate duplicated matrix construction logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use paramShortNames consistently in both CI and fallback branches, eliminating hardcoded parameter names and unused qrsToPUCT call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tune-params runs one game per trial sequentially because QRS-Tune needs each outcome before choosing the next sample point, so numGameThreads is never read and was misleading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The Fisher information matrix only depends on xs and current beta, not on the observed outcomes ys. Remove the misleading parameter and fix the doc comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

After printing optimal PUCT parameters, tune-params now outputs a ready-to-use match command so users can quickly verify the tuned parameters against defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Verifies QRS-Tune can locate a shifted peak in a stochastic 1D quadratic landscape (true optimum at x*=0.35), complementing the existing step-function end-to-end test which only checks x>0 separation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert QRSBuffer::prune to retain oldest min_keep_ samples (insertion order) instead of best-scoring ones. The sort-based pruning removed spatially diverse early exploration samples, enabling a feedback loop where the model drifts to boundary values on flat landscapes. Also tighten the clamped detection in computeOptimumSE to catch near-boundary optima (within 1e-9) and cache predictions in prune to avoid redundant predict() calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The charts were removed in 4cdf5bc because saturated sigmoid values made them misleading. Bring them back so users can visually inspect the fitted win-rate surface for each PUCT dimension after tuning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix computeOptimumSE producing wild CIs for boundary-clamped dims by returning early when convex and zeroing Jacobian rows for clamped dims. Skip pruning when model has convex dimensions to prevent discarding good samples based on unreliable predictions. Add optional verbose logging (config key 'verbose') that logs refit diagnostics, pruning events, and per-trial sample coordinates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cpuctExploration and cpuctExplorationLog are strongly correlated in the PUCT formula (both additively scale exploration), making the optimizer unable to distinguish their individual effects. Dropping cpuctExploration reduces dimensionality from 3 to 2, improving optimizer convergence. The experiment bot's cpuctExploration now stays fixed at the config value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…faults in tune config The QRS tuner's Newton-Raphson can diverge when warm-starting from saturated beta values: the Hessian degenerates to just the L2 prior, causing oscillating overshoots that grow across refits. The new test reproduces this by running a flat 2D landscape and scanning seeds until one triggers |intercept| > 50 (observed values: 400-500 vs < 1 normal). Also update tune_params_example.cfg to use default parameter ranges and enable GTP-equivalent search features so tuning reflects actual play. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reduce QRS-Tune optimization from 2D to 1D, tuning only cpuctExplorationLog. Also fix intercept divergence in QRSModel::fit by resetting beta to prior mean before each IRLS iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Defer the QRS sample log message from nextSample() to addResult(), so the experiment outcome (exp wins / exp loses / draw) can be appended to the same line. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…arams Expand tune-params from 1 to 3 tuned dimensions using a data-driven TuneDimension struct with pointer-to-member fields, so adding future parameters requires only a single table entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Y now indicates the normal case (proper concave maximum found), matching user expectation that Y = good and N = problematic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Drop cpuctUtilityStdevPriorWeight from the tuned dimensions — its peak effect on explore scaling is ~4% (at ~15-visit nodes), far too small to detect in 1000 binary-outcome games, and its persistently convex quadratic coefficient prevents QRS convergence. This reduces the model from 10 parameters (3D) to 6 (2D), stabilizing the quadratic fit. Also soften buffer pruning to never remove more than half the buffer in one pass, avoiding sudden data cliffs that destabilize the fit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Makes tune-params verbose output show whether the experiment bot played as black or white, aiding diagnosis of color-dependent effects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1000-trial tuning run showed cpuctUtilityStdevPrior has a convex (monotonically increasing) response with no optimum in range. A 200-game match confirmed stdev-scaling (0.85 vs 0.0) produces no significant difference (p=0.69) with the b6c64 model at 4kv. The parameter wastes half the tuning budget and causes the QRS optimizer to fall back to non-concave mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…000 trials Verify that QRSTuner converges to the true optimum of a shared 2D quadratic landscape with progressively tighter accuracy as trial budget increases. Asserts distance-to-optimum, win probability, monotonic SE shrinkage, and that the true optimum falls within each run's 95% confidence interval. Also lower cpuctExplorationLogMax from 10.0 to 5.0 in tune_params_example.cfg. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the fitted quadratic surface has convex dimensions (no reliable peak), sample uniformly across [-1,+1]^D instead of clustering around the origin via N(0, sigma_initial). This avoids center-biased exploration on flat or noise-dominated landscapes where the MAP optimum is meaningless. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude and others added 30 commits March 31, 2026 13:31

Add build/ to .gitignore

80c8ff7

The root-level build/ directory is used for out-of-source CMake builds. https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

Add tune-params and QRS-Tune documentation to README files

7528c87

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add example config for tune-params subcommand

b6cb552

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Revert .gitignore change that removed build/ entry

57d4403

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add build/ to .gitignore for root-level build directory

460b71e

https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

Revert adding build/ to .gitignore

3fa5749

Build directory moved under cpp/build which is already gitignored. https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895

Use short param names in tune-params progress CIs for consistency

a940abb

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge progress and CI lines into single-line output in tune-params

51a0030

Use paramShortNames consistently in both CI and fallback branches, eliminating hardcoded parameter names and unused qrsToPUCT call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove unused ys parameter from computeOptimumSE

33aaaf7

The Fisher information matrix only depends on xs and current beta, not on the observed outcomes ys. Remove the misleading parameter and fix the doc comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Print suggested match command after tune-params results

2202b1d

After printing optimal PUCT parameters, tune-params now outputs a ready-to-use match command so users can quickly verify the tuned parameters against defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ChinChangYang and others added 8 commits April 4, 2026 06:40

Switch tune-params example config to Tromp-Taylor rules with komi 7

406048b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename verbose log label from convex=Y/N to concave=Y/N

41139ee

Y now indicates the normal case (proper concave maximum found), matching user expectation that Y = good and N = problematic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Append experiment bot color to QRS sample verbose log

0bc3dfe

Makes tune-params verbose output show whether the experiment bot played as black or white, aiding diagnosis of color-dependent effects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ChinChangYang changed the title ~~Add QRS-Tune PUCT hyperparameter tuning and match statistics~~ Add tune-params subcommand for PUCT hyperparameter tuning via QRS optimization Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tune-params subcommand for PUCT hyperparameter tuning via QRS optimization#1178

Add tune-params subcommand for PUCT hyperparameter tuning via QRS optimization#1178
ChinChangYang wants to merge 38 commits intolightvector:masterfrom
ChinChangYang:claude/katago-puct-tuning-guide-OXzst

ChinChangYang commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ChinChangYang commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New: tune-params subcommand (cpp/command/tuneparams.cpp)

New: cpp/qrstune/QRSOptimizer.h + QRSOptimizer.cpp

Enhanced: match command statistics (cpp/command/match.cpp)

New: cpp/core/elo.h + elo.cpp, cpp/core/fancymath.h + fancymath.cpp

New: cpp/configs/tune_params_example.cfg

Documentation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChinChangYang commented Apr 1, 2026 •

edited

Loading

New: `tune-params` subcommand (`cpp/command/tuneparams.cpp`)

New: `cpp/qrstune/QRSOptimizer.h` + `QRSOptimizer.cpp`

Enhanced: `match` command statistics (`cpp/command/match.cpp`)

New: `cpp/core/elo.h` + `elo.cpp`, `cpp/core/fancymath.h` + `fancymath.cpp`

New: `cpp/configs/tune_params_example.cfg`