LOU Attitude Adjuster

This is my bot for CMU's AI Poker Contest. It's named after the Culture warship, and also because of what it does: blend a Deep CFR approach with a Bayesian posterior over opponent's holes (i.e., an attitude adjuster).

Note

LLM Usage: I worked heavily with Claude Code while building this.

I didn't have time to finish the training run. So what I actually submitted was a simple Bayesian heuristic bot submission/frank_exchange.py, named after the VFP Frank Exchange of Views. It keeps the Bayesian posterior logic, and instead of using Deep CFR it samples actions according to Thompson sampling with some manual heuristics for strategic depth.

This simpler bot family was doing quite well for a while (attaining a max rank of 2), but as compute budget scaled and people deployed more (and more sophisticated) approaches it fell. It ultimately placed around 37th out of 110 or so.

The Game

The game we're learning is a modified version of HUNL ("Heads-Up No Limit") poker. In particular:

The deck is 27 cards (3 suits, no Jacks, Queens, or Kings). Aces can be high or low.
Each player (Small Blind or Big Blind) is dealt 5 preflop. After preflop betting, there is a discard phase.
- BB discards 3 cards first, and these are revealed to SB.
- Then SB discards 3 cards, and these are revealed to BB.
Then normal HUNL poker is played on this reduced deck.

Bots are given stacks of 100 chips at the start of each of 1000 hands. The bot up in chips at the end of 1000 hands win.

Constraints

This bot had to be prepared under strict constraints. In particular:

1GB upload limit.
4GB RAM, 4 vCPUs, and 1500 seconds total to move on the server. (competition had compute on a sliding scale; this was the highest/last phase.)
One week from revelation of the variant to the competition.
I only had my laptop (16GB RAM, 8 virtual M4 cores) available for training.

The RAM point is crucial, since the algorithm requires holding a buffer in memory, and hitting swap memory was a noticeable slowdown.

Approach

Hart and Mas-Colell proposed the regret matching algorithm to find (correlated) Nash equilibria of incomplete-information games. This was operationalized by Brown and Sandholm in an algorithm called Counterfactual Regret Minimization (CFR.) CFR is generally considered the state of the art for incomplete information games (and in particular Poker), but the tabular lookup is expensive.

People have experimented with various abstractions to reduce the effective state space size (e.g., applying a suit permutation to hands and the flop.) Here that would reduce the state space for Big Blind from C(27, 5) * C(22, 3) = 124M to about 20M. Further reductions may be possible if you know something about the game (e.g., if you can do lossy compression to the 20M permuted keys.)

But this requires being a good poker player. The best learning algorithm cannot save you from a bad abstraction. So one approach is to use what is effectively an autoencoder to learn these abstractions nonparametrically via an algorithm called Deep CFR. Under these constraints, however, Deep CFR is noisy (and even in general it throws out domain-specific knowledge.)

Our approach is posterior blending. We maintain a Bayesian posterior over opponent's hole cards, beginning with a uniform distribution over the C(16, 2) possible pairs once all discards are known (27 deck cards - 5 hand cards - 3 flop cards - 3 opp discards = 16.) This posterior is updated based on a simple heuristic (though bootstrapping from the strategy network is possible.) The strategy network outputs a distribution over actions which is combined linearly with that induced by the posterior, according to the effective range of the posterior (i.e., its entropy, or the number of pairs above some confidence threshold.) That is:

weight = min(0.2, (POSTERIOR_THRESHOLD - eff) / POSTERIOR_THRESHOLD)
return (1 - weight) * probs + weight * ev_probs

is the heart of the algorithm.

The rest is mechanics. In particular:

We decide our discards by a "Level 2.5" rationality scheme. We assume that our opponent is discarding strategically (Level 1.) However, we also partially internalize the effects of our discard decision (as BB, who discards first) on SB. We assume that SB sees our discards and correctly realizes that they cannot be drawn, but do not model SB's inference about BB's hole cards (h1, h2) given that BB discarded (d1, d2, d3).
Posteriors are updated using a simple Monte Carlo in response to raises and checks.
Actions are softmaxes over a final action distribution.

Code Structure

All new code lives in submission/.

The network architecture for the DCFR approach is in submission/network.py.
The traverser is in submision/traversal.py.
The player agent is in submission/player.py.
The training loop is in submission/train.py.
The observation encoder is in submission/encoder.py.
Checkpoints from a longer run (on a 12-core M2 Max) are in submission/checkpoints.

How to Train the Bot

Caution

Training is very compute and memory intensive.

Run python submission/train.py from top-level, after activating the venv.

How to Run the Engine

Create a virtual environment:
```
python3.12 -m venv .venv
```
Activate the virtual environment:
- On Windows:
```
.venv\Scripts\activate
```
- On macOS and Linux:
```
source .venv/bin/activate
```
Install the required packages:
```
pip install -r requirements.txt
```

Running Tests

Basic coverage test:

pytest --cov=gym_env --cov-report=term-missing --cov-report=html --cov-branch

Testing

To test the Attitude Adjuster against ProbabilityAgent, AllInAgent, FoldAgent, CallingStationAgent, RandomAgent:

python agent_test.py

To run a full match (1000 hands) of your agent against a specific agent:

python run.py

You can modify which bots play by modifying the agent config file. Write the file path to the corresponding agent for that bot to play.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
agents		agents
matches/top		matches/top
submission		submission
.gitignore		.gitignore
README.md		README.md
REVIEW.md		REVIEW.md
agent_config.json		agent_config.json
agent_test.py		agent_test.py
api_test.py		api_test.py
create_release.sh		create_release.sh
engine_test.py		engine_test.py
gym_env.py		gym_env.py
match.py		match.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
time.py		time.py
time.sh		time.sh
train_rl_agent.py		train_rl_agent.py
uv.lock		uv.lock
validate_table.py		validate_table.py
visualizer.py		visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LOU Attitude Adjuster

The Game

Constraints

Approach

Code Structure

How to Train the Bot

How to Run the Engine

Running Tests

Testing

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LOU Attitude Adjuster

The Game

Constraints

Approach

Code Structure

How to Train the Bot

How to Run the Engine

Running Tests

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages