Add tune-params subcommand for PUCT hyperparameter tuning via QRS optimization#1178
Draft
ChinChangYang wants to merge 38 commits intolightvector:masterfrom
Draft
Add tune-params subcommand for PUCT hyperparameter tuning via QRS optimization#1178ChinChangYang wants to merge 38 commits intolightvector:masterfrom
ChinChangYang wants to merge 38 commits intolightvector:masterfrom
Conversation
Introduce tune-params subcommand for sequential optimization of KataGo PUCT parameters (cpuctExploration, cpuctExplorationLog, cpuctUtilityStdevPrior) using QRS-Tune, a quadratic response surface optimizer with logistic regression and confidence-based pruning. Add match statistics output with Bradley-Terry Elo ratings, Wilson confidence intervals, and pairwise win/loss/draw summaries. New files: - cpp/qrstune/QRSOptimizer.h: header-only QRS-Tune optimizer library - cpp/command/tuneparams.cpp: tune-params subcommand implementation Modified files: - cpp/CMakeLists.txt: add tuneparams.cpp to build - cpp/main.h, cpp/main.cpp: register tune-params subcommand - cpp/command/match.cpp: add Elo/CI/p-value statistics after matches https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
The root-level build/ directory is used for out-of-source CMake builds. https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
- Add missing NeuralNet::globalCleanup() call before ScoreValue::freeTables() to properly clean up neural net backend state on exit - Hoist bestWinRate computation out of per-dimension loop in printRegressionCurves() (value is invariant across dimensions) - Remove unnecessary step-number comments that restated the code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename ALL_CAPS constants to camelCase (nDims, paramNames, plotW, plotH, qrsDefaultMins/Maxs, rangeMinKeys/MaxKeys, eloPerStrength) - Change nullptr to NULL to match KataGo's dominant convention - Change "// Comment" to "//Comment" (no space after //) - Change "// --- Section ---" separators to "//Section" style - Leave QRSOptimizer.h unchanged (standalone library, own namespace) https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
…stency Add graceful SIGINT/SIGTERM shutdown to tuneparams matching the pattern used by match.cpp and other long-running commands. Fix QRSBuffer::prune to retain highest-quality samples rather than oldest insertion-order ones when applying min_keep. Add missing inline on gaussianSolve in header. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add convergence detection to computeBradleyTerryElo in match.cpp so that a warning is logged when the Newton-Raphson solver hits the 200 iteration limit without converging. Change QRSOptimizer.h free functions from static inline to inline for correct weak external linkage in the namespaced header-only library. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The header-only design violated KataGo's convention of separating declarations (.h) from implementations (.cpp). Move all non-trivial function bodies to QRSOptimizer.cpp, replace #pragma once with #ifndef guard, trim header includes, and have predict() delegate to score() to eliminate duplicated logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build directory moved under cpp/build which is already gitignored. https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
Move wilsonCI95 and oneTailedPValue from static functions in match.cpp to FancyMath namespace in core/fancymath.h/.cpp, following KataGo's pattern of placing reusable math utilities in core namespaces. Move computeBradleyTerryElo from static function in match.cpp to ComputeElos namespace in core/elo.h/.cpp, alongside the existing Elo computation utilities. match.cpp now calls FancyMath::wilsonCI95(), FancyMath::oneTailedPValue(), and ComputeElos::computeBradleyTerryElo() instead of file-local statics. tuneparams.cpp static functions (qrsDimToReal, qrsToPUCT, printRegressionCurves) are kept as file-local statics since they are command-specific helpers, matching KataGo's pattern for command files. https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
- Use existing ELO_PER_LOG_GAMMA constant instead of recomputing - Hoist Newton-loop allocations in computeBradleyTerryElo (grad, H, aug, delta) - Hoist Newton-loop allocations in QRSModel::fit (grad, negH) - Remove dead sigReceived state in tuneparams.cpp - Add n <= 0 guard to FancyMath::wilsonCI95 to prevent division by zero - Use std::move in QRSBuffer::prune for kept sample vectors - Fix cpp/README.md: qrstune is no longer header-only, fix algorithm name https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
Readability: - Add file-level comment explaining the QRS-Tune algorithm - Document feature layout with example (D=2: [1, a, b, a^2, b^2, a*b]) - Name magic numbers: SINGULAR_THRESHOLD, CONVERGENCE_THRESHOLD, SIGMOID_CLAMP - Rename shadow variable 'f' to 'mult' in gaussianSolve - Rename terse variables: z->logit, w->hessianWeight, resid->residual, maxd->maxStep, b_lin->linearCoeffs, b_quad->quadCoeffs, b_cross->crossCoeffs, p_best->bestPrediction, kv->entry, nx/ny->newXs/newYs - Add phase comments in fit() documenting Newton-Raphson steps - Add algorithm-level comment above fit() explaining the objective function Tests (8 test cases): - numFeatures: verify D=0,1,2,3 - computeFeatures: verify feature vector for D=2 - sigmoid: boundary, midpoint, and clamp behavior - gaussianSolve: 2x2 system, 3x3 identity, singular detection - QRSModel fit+predict: 1D and 2D separable data - QRSModel mapOptimum: optimum better than anti-optimum - QRSTuner end-to-end: 100 trials with deterministic seed - QRSBuffer prune: verify buffer size reduction https://claude.ai/code/session_01396bbJUdHCsiWRVPM58895
Replace the hardcoded every-100-trials progress log with reporting at every 10% of total trials. Each progress line now includes percentage, win/loss/draw counts, best parameters, and estimated time of arrival. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The ASCII regression charts and est.winrate displayed saturated sigmoid values (always 1.0 or 0.0) due to aggressive buffer pruning inflating model coefficients, confusing users who saw ~50% actual win rates. Replace charts with 95% confidence intervals computed via the delta method (computeOptimumSE), shown in both progress reports and final results. Extract shared helpers (buildNegHessian, buildQuadHessian, computeParamCIs) to eliminate duplicated matrix construction logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use paramShortNames consistently in both CI and fallback branches, eliminating hardcoded parameter names and unused qrsToPUCT call. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tune-params runs one game per trial sequentially because QRS-Tune needs each outcome before choosing the next sample point, so numGameThreads is never read and was misleading. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Fisher information matrix only depends on xs and current beta, not on the observed outcomes ys. Remove the misleading parameter and fix the doc comment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After printing optimal PUCT parameters, tune-params now outputs a ready-to-use match command so users can quickly verify the tuned parameters against defaults. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies QRS-Tune can locate a shifted peak in a stochastic 1D quadratic landscape (true optimum at x*=0.35), complementing the existing step-function end-to-end test which only checks x>0 separation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert QRSBuffer::prune to retain oldest min_keep_ samples (insertion order) instead of best-scoring ones. The sort-based pruning removed spatially diverse early exploration samples, enabling a feedback loop where the model drifts to boundary values on flat landscapes. Also tighten the clamped detection in computeOptimumSE to catch near-boundary optima (within 1e-9) and cache predictions in prune to avoid redundant predict() calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The charts were removed in 4cdf5bc because saturated sigmoid values made them misleading. Bring them back so users can visually inspect the fitted win-rate surface for each PUCT dimension after tuning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix computeOptimumSE producing wild CIs for boundary-clamped dims by returning early when convex and zeroing Jacobian rows for clamped dims. Skip pruning when model has convex dimensions to prevent discarding good samples based on unreliable predictions. Add optional verbose logging (config key 'verbose') that logs refit diagnostics, pruning events, and per-trial sample coordinates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cpuctExploration and cpuctExplorationLog are strongly correlated in the PUCT formula (both additively scale exploration), making the optimizer unable to distinguish their individual effects. Dropping cpuctExploration reduces dimensionality from 3 to 2, improving optimizer convergence. The experiment bot's cpuctExploration now stays fixed at the config value. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…faults in tune config The QRS tuner's Newton-Raphson can diverge when warm-starting from saturated beta values: the Hessian degenerates to just the L2 prior, causing oscillating overshoots that grow across refits. The new test reproduces this by running a flat 2D landscape and scanning seeds until one triggers |intercept| > 50 (observed values: 400-500 vs < 1 normal). Also update tune_params_example.cfg to use default parameter ranges and enable GTP-equivalent search features so tuning reflects actual play. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce QRS-Tune optimization from 2D to 1D, tuning only cpuctExplorationLog. Also fix intercept divergence in QRSModel::fit by resetting beta to prior mean before each IRLS iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Defer the QRS sample log message from nextSample() to addResult(), so the experiment outcome (exp wins / exp loses / draw) can be appended to the same line. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…arams Expand tune-params from 1 to 3 tuned dimensions using a data-driven TuneDimension struct with pointer-to-member fields, so adding future parameters requires only a single table entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Y now indicates the normal case (proper concave maximum found), matching user expectation that Y = good and N = problematic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop cpuctUtilityStdevPriorWeight from the tuned dimensions — its peak effect on explore scaling is ~4% (at ~15-visit nodes), far too small to detect in 1000 binary-outcome games, and its persistently convex quadratic coefficient prevents QRS convergence. This reduces the model from 10 parameters (3D) to 6 (2D), stabilizing the quadratic fit. Also soften buffer pruning to never remove more than half the buffer in one pass, avoiding sudden data cliffs that destabilize the fit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makes tune-params verbose output show whether the experiment bot played as black or white, aiding diagnosis of color-dependent effects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1000-trial tuning run showed cpuctUtilityStdevPrior has a convex (monotonically increasing) response with no optimum in range. A 200-game match confirmed stdev-scaling (0.85 vs 0.0) produces no significant difference (p=0.69) with the b6c64 model at 4kv. The parameter wastes half the tuning budget and causes the QRS optimizer to fall back to non-concave mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…000 trials Verify that QRSTuner converges to the true optimum of a shared 2D quadratic landscape with progressively tighter accuracy as trial budget increases. Asserts distance-to-optimum, win probability, monotonic SE shrinkage, and that the true optimum falls within each run's 95% confidence interval. Also lower cpuctExplorationLogMax from 10.0 to 5.0 in tune_params_example.cfg. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When the fitted quadratic surface has convex dimensions (no reliable peak), sample uniformly across [-1,+1]^D instead of clustering around the origin via N(0, sigma_initial). This avoids center-biased exploration on flat or noise-dominated landscapes where the MAP optimum is meaningless. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
tune-paramssubcommand for automated PUCT hyperparameter tuning using QRS (Quadratic Response Surface) optimization. The optimizer runs sequential head-to-head matches between a base bot and an experiment bot, fitting a quadratic logistic regression model to propose better parameter combinations over time.Currently tunes
cpuctExplorationLogby default, with the framework supporting additional PUCT dimensions.Changes
New:
tune-paramssubcommand (cpp/command/tuneparams.cpp)QRSTuner::addResult()QRSTuner::nextSample()(MAP optimum + decaying Gaussian noise)matchcommand to verify resultsmatch.cpp)New:
cpp/qrstune/QRSOptimizer.h+QRSOptimizer.cpp[1, x_i, x_i², x_i·x_j]computeOptimumSE()— 95% CIs via delta method (implicit differentiation through MAP optimization)hasConvexDim()— Detects noisy/unreliable dimensions and falls back to uniform samplingEnhanced:
matchcommand statistics (cpp/command/match.cpp)New:
cpp/core/elo.h+elo.cpp,cpp/core/fancymath.h+fancymath.cppNew:
cpp/configs/tune_params_example.cfgExample config with 500 trials,
maxVisits = 500, Tromp-Taylor rules with komi 7, and GTP-equivalent search defaults (useUncertainty,policyOptimism,useNoisePruning,useGraphSearch).Documentation
README.md— Added command example fortune-paramscpp/README.md— Addedqrstune/to source folder summary andtuneparams.cppto command listTest plan
./katago runtests— 15 QRSTune test suites)tune-paramswith example config on real model to verify end-to-end tuning loopmatchcommand to validate tuned parameters improve win rate🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com