AllIn is a production-grade artificial intelligence for heads-up No-Limit Texas Hold'em, built on Monte Carlo CFR+ (Counterfactual Regret Minimization) — the same family of self-play, regret-minimization algorithms behind championship-level poker bots. It approximates game-theory-optimal (GTO) strategy through millions of iterations of self-play, serves that strategy through a Flask API, and exposes it in an interactive React platform.
- Monte Carlo CFR+ with external sampling: each iteration samples chance and opponent actions, walking one trajectory through the game tree instead of the full exponential tree — making millions of training iterations tractable.
- Discounted CFR+ (Linear-CFR-style): time-discounted regret updates (α = 1.5)
for faster, more stable convergence toward a Nash equilibrium. (CFR+ with a
((t-1)/t)^αdiscount on floored regrets — not the canonical DCFR α/β/γ scheme.) - Self-play reinforcement learning: no human data and no hand-crafted heuristics — the strategy emerges purely from regret minimization.
- Multi-layer abstraction: a hierarchical state representation built from 15 equity-based preflop buckets + distribution-aware (potential-aware) postflop buckets (12 flop / 12 turn / 10 river) clustered by Earth Mover's Distance over equity distributions.
Active blueprint (analysis/blueprints/blueprint_*.db):
├── Algorithm: Monte Carlo CFR+ with external sampling + Linear-CFR-style discount (α=1.5)
├── Training iterations: 6,500,000
├── Information sets: 26,052 unique strategic situations
├── Game: Heads-up NLHE, 100 BB effective stacks (SB 1 / BB 2)
└── Storage: SQLite (incremental checkpoint + resume)
Training Pipeline:
Random self-play deal → Monte Carlo CFR+ traversal → regret/strategy update →
SQLite checkpoint → automatic active-blueprint selection → API inference
Core technologies (actually used):
- NumPy for vectorized numerical computing (regret matching, the exploitability evaluator)
- phevaluator — high-performance C hand-strength library
- SQLite (WAL mode) for incremental, resumable strategy storage
- Flask REST API · React + Vite frontend
- Hypothesis property-based testing for engine correctness
- External sampling turns a full game-tree traversal into a single sampled path per iteration — the key to scaling to millions of iterations.
- CFR+ regret flooring (clamping cumulative regrets at 0) accelerates convergence over vanilla CFR.
- Position-aware information sets learn in-position and out-of-position play separately.
- Stack-aware game engine models real chip costs, all-ins, and side-stack constraints — not a toy abstraction.
- Exploitability evaluator measures how far the blueprint is from unexploitable (best-response, in milli-big-blinds/hand) so convergence is measured, not assumed.
- Python 3.12 — core development language
- NumPy — vectorized regret matching and best-response evaluation
- phevaluator — O(1) hand evaluation via precomputed tables
- SQLite — blueprint persistence with checkpoint/resume + read-while-writing
- Monte Carlo CFR+ with external sampling and Linear-CFR-style discounting
- Nash-equilibrium approximation through iterative self-play
- Feature engineering: equity-based card bucketing, action abstraction, and position-aware information-set keys
- Flask API — strategy lookup + live game endpoints
- React + Vite frontend — strategy explorer and play-vs-bot table
- PyPokerEngine — used in the test harness for bot-vs-bot simulation
- phevaluator — fast showdown evaluation
- Fast inference: direct blueprint lookup from SQLite, no per-decision search.
- Distribution-aware abstractions: 30-fine/10-coarse decoupled preflop + 20/16/10 potential-aware postflop buckets (EMD-clustered equity distributions).
- Mixed-strategy output: probability distributions over fold / call / bet / raise / all-in, sampled at play time.
- Honest "unknown" handling: situations never reached in training report
found: falserather than guessing.
- Strategy Explorer — look up the blueprint's play for any spot:
- Hand Explorer: enter real cards + a betting line, see the resulting info-set key and strategy.
- Key Explorer: build an info-set key from abstraction dropdowns (or paste one) and see the strategy.
- Play vs the Bot — an interactive heads-up table against the trained AI, 100 BB deep, with full action and pot tracking.
- Exploitability scoring via a vectorized best-response walk of the public
game tree (
tests/run_evaluation.py). - Property-based testing (Hypothesis) over the engine's semantic invariants — chip conservation, call/all-in arithmetic, legal-action shape — backed by a documented bug log.
- Python 3.12
- Node.js 18+ (frontend)
- Git
git clone https://github.com/jianrontan/AllIn.git
cd AllIn# Install Python dependencies
cd backend
pip install -r requirements.txt
# Start the inference API (must run from backend/api/)
cd api
python strategy_api.py # http://localhost:5000cd frontend
npm install
npm run dev # http://localhost:5173cd backend/bot
# Quick smoke run (seconds)
python -c "from tests.run_blueprint_trainer import run_training; run_training(100)"
# A real run — checkpoints as it goes; resume any time with resume='<db>.db'
python -c "from tests.run_blueprint_trainer import run_training; run_training(5000000)"Training writes a timestamped backend/bot/analysis/blueprints/blueprint_*.db. The API and
bot automatically use the blueprint with the most iterations — no manual
promotion step.
- Open the frontend at
http://localhost:5173. - Strategy Explorer: enter a hand + betting line (or build an info-set key) and get the GTO strategy with probabilities.
- Play vs the Bot: play heads-up against the AI and watch how it responds.
cd backend/bot
python tests/run_evaluation.py --samples 1000 # exploitability in mbb/hand (lower = better)- ✅ Blueprint training — Monte Carlo CFR+ with SQLite checkpoint/resume
- ✅ Serving + Play-vs-bot — Flask API + React platform
- ✅ Exploitability evaluation — best-response convergence scoreboard
- 🚧 Subgame solving — real-time re-solving with full pot/stack information (fixes the abstraction's stack-depth blind spot)
- 📅 Online 1v1 play on AWS — Redis/DynamoDB session store, WebSocket transport, unrestricted human bet sizing
See docs/ROADMAP.md for detail, and docs/DEVELOPER_GUIDE.md for the architecture.
| Doc | Purpose |
|---|---|
| USER_GUIDE.md | Install, train, run, play, evaluate |
| docs/DEVELOPER_GUIDE.md | Architecture and module reference |
| docs/ROADMAP.md | Phase status and what's next |
| docs/TRAININGFLOW.md | One CFR+ iteration, end to end |
| CLAUDE.md | Canonical short reference for contributors |
| backend/bot/docs/BUG_LOG.md | Correctness bug history |