Snake AI - Deep Q-Learning

This project implements a vectorized Dueling Deep Q-Network (DQN) to train an AI agent to play Snake. It was developed for a 3rd-year university project (SAE) at UPJV (Université de Picardie Jules Verne).

Authors: Stéphane TALAB (@ProGen18) & Mathis PIAULT (@Dova3kin)

Context

This project is a SAE (Situation d'Apprentissage et d'Évaluation) for the 3rd year of Computer Science at UPJV. We built a reinforcement learning pipeline from scratch: vectorized engine, neural network, training loop, and a real-time dashboard.

Features

AI & Training

Vectorized Environments: Runs N Snake instances in parallel using NumPy, making experience collection much faster.
Dueling DQN: Separates state value V(s) and action advantages A(s,a) to improve convergence.
Double DQN: Reduces Q-value overestimation by using a target network for evaluation.
N-step Returns: Multi-step returns (default n=3) for better credit assignment.
Epsilon-Greedy: Linear decay of exploration from 1.0 to 0.0.
Strategy Blending: Combines model output, a greedy heuristic, and safe-random moves based on epsilon.
Safe-Random Fallback: Avoids immediate wall/body collisions during random exploration.
Greedy Heuristic: Hand-coded logic to move toward food, used as reference early on.
Experience Replay: 200k-capacity FIFO buffer using NumPy arrays.
Apple Oversampling: A dedicated 20k buffer for food-gathering transitions to prevent the agent from "forgetting" how to eat.
Soft Updates: Gradual target network updates (tau=0.005) for stability.
LR Scheduler: ReduceLROnPlateau halves learning rate when performance stalls (active below 0.2 epsilon).
Gradient Clipping: Limited to max_norm=1.0.
Auto-Saving: Models save automatically when a new record is reached.
Evaluations: Periodic greedy tests (100 episodes) to track true performance.
Starvation Timeout: Kills the snake if it doesn't eat within a calculated timeframe.
Flood Fill: BFS-based dead-end detection included in the state vector.
Body Density: Monitors crowding around the head.
Lookahead: Includes 2-step danger detection.
Tail Sensor: Heads-to-tail vector helps the agent sense its own body shape.

Game Engine

Fully Vectorized: N environments advance simultaneously using NumPy (no Python loops).
Actions: Straight, Turn Right, Turn Left (relative to current direction).
Body Arrays: Positions stored in pre-allocated arrays to avoid dynamic allocation.
Collisions: Calculated via boolean masks and indexing over the full batch.
Partial Resets: Resets only the terminated environment.
Apple Spawning: Automatic placement on empty tiles.

Dashboard & Monitoring

4-Quadrant Flexible Layout: The dashboard is divided into 4 panels that can each display: JEU (game view), COURBE (live score graph), HISTORIQUE (score scatter), or VISION (state feature bars).
Panel Cycling: Clicking on a panel's title cycles through the 4 available views.
Left Menu: 3 sections (STATS, CONTROLS, and OPTIONS).
6 Live Stat Cards: Parties (games), Record, Moy100 (100-game moving average), TPS (steps/second), Temps (elapsed time), Epsilon.
5 Control Buttons: Save, Load, Pause/Resume, Screenshot, Export Excel.
Auto-Screenshot: Configurable interval (in seconds) to capture the dashboard automatically.
Pygame-native Score Graph: Rolling line chart (last 2000 scores). Rendered in pure Pygame (no Matplotlib overhead).
Matplotlib History Scatter: Session-wide scatter plot of all scores, updated every 5 seconds.
State Feature Visualizer: Bar chart of all 26 input features fed to the network.
Save/Load Modal: File browser modal to choose save files from the model/ directory.
Excel Export: Full training history (timestamp, games, epsilon, record, average, TPS) exported to an .xlsx file in logs/.
Console Logging: Timestamped log lines printed to stdout at a configurable interval.

Tech Stack

Library	Version	Purpose
Python	3.10+	Language
PyTorch	2.0+	Neural network, training
Pygame	2.5+	Game rendering & dashboard UI
NumPy	1.24+	Vectorized environment & memory
Pandas	2.0+	Data handling for logging
Matplotlib	3.7+	History scatter plot
openpyxl	3.1+	Excel export
IPython	8.10+	Interactive console support

Architecture

Directory Structure

Snake_AI/
├── agent.py            # Main training loop, Agent class, replay buffer, rendering
├── config.py           # All hyperparameters (ConfigEntrainement + ConfigAffichage)
├── dashboard.py        # Pygame dashboard — 4-panel layout, menu, controls
├── game.py             # Vectorized Snake engine (N parallel environments)
├── logger.py           # Console logging + Excel export
├── model.py            # Dueling DQN network + Trainer (optimizer, target net)
├── widgets.py          # Reusable UI components (Button, InputBox, StatCard, Graph)
├── requirements.txt    # Python dependencies
├── tests/
│   ├── test_agent.py   # Tests for MemoireEfficace (replay buffer)
│   ├── test_game.py    # Tests for JeuVectorise (game engine)
│   ├── test_model.py   # Tests for ReseauNeurones + Entraineur
│   ├── test_widgets.py # Tests for all UI widget classes
│   └── test_dashboard.py
├── model/              # Saved model checkpoints (.pth)
├── logs/               # Excel training logs (.xlsx)
├── screenshots/        # Auto-captured screenshots (.png)
├── src/
└──   └── demoSnake.gif   # Demo animation (used in README)

Module Responsibilities

Module	Class(es)	Responsibility
`agent.py`	`MemoireEfficace`, `RenduPygame`, `AgentIA`	Training loop, replay buffer, Pygame render
`game.py`	`JeuVectorise`	N-environment Snake engine
`model.py`	`ReseauNeurones`, `Entraineur`	Network definition, optimizer, target net
`dashboard.py`	`Dashboard`	Full Pygame UI orchestration
`widgets.py`	`Bouton`, `BoiteSaisie`, `StatCard`, `GraphiquePygame`	Reusable UI components
`logger.py`	`JournalDeBord`	Timestamped logs + Excel export
`config.py`	`ConfigEntrainement`, `ConfigAffichage`	All constants and hyperparameters

Neural Network

The model is a Dueling DQN (ReseauNeurones in model.py).

Input: 26 features
        │
   Linear(26 → 256) + LayerNorm + ReLU
        │
   Linear(256 → 256) + LayerNorm + ReLU
        │
   ┌────┴────┐
   │         │
Value      Advantage
stream     stream
   │         │
Lin(256→128) Lin(256→128)
   + ReLU      + ReLU
   │         │
Lin(128→1)  Lin(128→3)
   │         │
   └────┬────┘
        │
  Q(s,a) = V(s) + (A(s,a) − mean_a(A(s,a)))

Output: 3 Q-values → argmax = chosen action

Training algorithm:

Double DQN: action* = argmax_a Q_main(s', a) → evaluated as Q_target(s', action*)
Bellman target: Q_target = R + γ^n × Q_next × (1 − done)
Loss: SmoothL1Loss (Huber)
Optimizer: Adam
Soft update: θ_target ← τ × θ_main + (1−τ) × θ_target after every step

Checkpoint format (.pth files):

{
    "model_state": ...,
    "optimizer_state": ...,
    "nb_parties": int,
    "temps_total": float,
    "epsilon": float,
    "record": int
}

State Vector

The network receives a vector of 26 normalized floats:

Index	Feature(s)	Description
0	`dist_pomme`	Manhattan distance to food, normalized by grid diagonal
1–2	`dir_x`, `dir_y`	Signed direction to food (x and y), normalized
3–5	`danger_straight`, `danger_right`, `danger_left`	`1.0` if moving in that direction causes immediate collision with wall or body, else `0.0`
6	`faim`	Steps since last food / famine timeout — hunger level (0.0 to 1.0)
7–8	`pos_x`, `pos_y`	Head position on the grid, normalized by grid width/height
9–11	`flood_straight`, `flood_right`, `flood_left`	BFS-accessible cells for each action, normalized by total grid size
12–15	`dir_right`, `dir_down`, `dir_left`, `dir_up`	One-hot encoding of the snake's current absolute direction
16	`longueur`	Snake length / max possible length
17–19	`mur_straight`, `mur_right`, `mur_left`	Normalized distance to wall in each relative direction
20–21	`queue_x`, `queue_y`	Normalized vector from head to tail
22–24	`danger2_straight`, `danger2_right`, `danger2_left`	`1.0` if a collision would occur 2 steps ahead
25	`densite_corps`	Ratio of body cells within a `densite_rayon`-cell radius of the head

Action Selection

At each step, we use a blended strategy:

With probability (1 − epsilon):
    → Model-based:  argmax Q(state)

With probability epsilon:
    → With decreasing probability: Greedy Heuristic (move toward food, safe fallback)
    → Otherwise: Safe Random (random action among collision-free moves)

The greedy heuristic and safe-random strategies both avoid moves that result in immediate death whenever a safe option exists.

Memory System

Main Replay Buffer (`MemoireEfficace`)

Property	Value
Capacity	`200,000` transitions
Structure	Pre-allocated NumPy ring buffer (FIFO)
Stores	`(state, action, reward, next_state, done)`
Sampling	Uniform random
Batch size	`256`

Apple Memory (`memoire_pommes`)

Property	Value
Capacity	`20,000` transitions
Stores	Only transitions where food was collected
Purpose	Oversample rare reward events
Threshold	Activated when apple memory has ≥ `128` entries
Samples per batch	`128` additional samples mixed with main batch

Dashboard

Views: JEU, COURBE, HISTORIQUE, VISION. Switch panels by clicking their titles.

Left Menu

STATS (Live Cards)

Card	Description
Parties	Total games
Record	Best score
Moy100	100-game average
TPS	Steps per second
Temps	Duration
Epsilon	Exploration rate

CONTROLS

Button	Key	Action
Sauvegarder	`S`	Save model
Charger	`L`	Load model
Pause / Resume	`Space`	Toggle training
Capture	`C`	Dashboard screenshot
Export Excel	`E`	Training history log

OPTIONS section

Option	Default	Description
Auto-screenshots toggle	Off	Automatically captures screenshots at fixed interval
Interval input box	`60` seconds	Time between auto-screenshots when enabled

Footer

Button	Action
Quitter	Closes the dashboard and stops training

Panels

Panel	Content	Update rate
JEU	Pygame render of environment 0 — snake with color gradient (head brighter), red apple	Every frame
COURBE	Pygame line chart — raw scores (blue), moving average (orange), record (gold dashed)	Every frame
HISTORIQUE	Matplotlib scatter of all session scores	Every ~5 seconds
VISION	Vertical bar chart of all 26 input features	Every frame

Widgets (`widgets.py`)

`Bouton`

Rectangular button with hover color transition.
Click detected via rect collision.
Supports dynamic text via setter.
Callback function triggered on click.

`BoiteSaisie`

Text input field with focus management.
Click to activate, click away to deactivate.
Keyboard input with backspace support.
Returns text on Enter key press.

`StatCard`

Fixed height: 52px.
Displays a label and a large value.
Optional bar mode: shows a progress bar with gradient from red to green.
Render is safe: defaults to "N/A" if the value is missing.

`GraphiquePygame`

Rolling buffer: last 2000 scores (deque with maxlen).
Draws axes, grid lines, and a legend.
Auto-scales Y axis to data range.
Record line drawn as a dashed gold horizontal line.

Configuration

Defined in config.py.

Training Settings (ConfigEntrainement)

Recommended to change for experiments

Parameter	Default	Type	Description
`graine`	`42`	`int`	Random seed. Change to reproduce different training runs.
`nb_environnements`	`20`	`int`	Number of parallel game instances. More = faster experience collection, higher RAM/CPU usage.
`taux_apprentissage`	`0.0003`	`float`	Adam optimizer learning rate. Try `0.001` for faster (but less stable) learning.
`gamma`	`0.97`	`float`	Future reward discount factor. Lower = more short-sighted agent.
`epsilon_depart`	`1.0`	`float`	Initial exploration rate. Usually keep at 1.0 for a fresh start.
`epsilon_fin`	`0.0`	`float`	Final exploration rate. Can be set to `0.05` to always keep some exploration.
`epsilon_frames`	`1000`	`int`	Number of training steps to decay epsilon from start to end. Increase for longer exploration.
`transitions_min_debut`	`5000`	`int`	Minimum replay buffer size before training starts. Reduce to `1000` for quick tests.
`eval_intervalle`	`5000`	`int`	Steps between greedy evaluation runs. Lower to see test scores more frequently.
`nb_episodes_test`	`100`	`int`	Number of greedy episodes per evaluation. Reduce for speed.
`recompense_pomme`	`1.0`	`float`	Reward for eating food. Increase to make food more important.
`recompense_mort`	`-1.0`	`float`	Penalty for dying. Increase magnitude to punish deaths more.
`recompense_step`	`-0.001`	`float`	Small penalty per step (encourages efficiency). Set to `0` to remove time pressure.
`famine_base`	`100`	`int`	Base number of steps before starvation (without food).
`famine_par_case`	`3`	`int`	Additional steps per snake body block before starvation.
`taille_batch`	`256`	`int`	Batch size for main training. Larger = smoother gradients, more memory.
`memoire_max`	`200000`	`int`	Replay buffer capacity. Reduce to `50000` for lower memory usage.
`log_intervalle_sec`	`1.0`	`float`	Console log interval in seconds. Set to `5.0` for less noise.
`graph_update_intervalle`	`100`	`int`	Dashboard graph update frequency (every N steps).

Change only if you understand the impact

Parameter	Default	Type	Description
`tau`	`0.005`	`float`	Target network soft update rate. Too high = unstable, too low = slow to learn.
`n_step`	`3`	`int`	N-step return window. Affects how far back rewards are propagated.
`blend_frames`	`150`	`int`	Steps for blending between heuristic and model-based exploration.
`freq_entrainement`	`8`	`int`	Training frequency: one gradient step every N environment steps.
`taille_batch_pqn`	`512`	`int`	Batch size for PQN-style training step.
`memoire_pommes_capacite`	`20000`	`int`	Apple memory buffer capacity.
`memoire_pommes_seuil`	`128`	`int`	Minimum apple memory size before oversampling is active.
`memoire_pommes_echantillons`	`128`	`int`	Apple samples added per training batch.
`flood_depth_facteur`	`0.5`	`float`	Flood fill BFS depth limit = `facteur × snake_length`.
`flood_depth_max`	`50`	`int`	Hard cap on flood fill depth.
`densite_rayon`	`3`	`int`	Radius (in cells) for body density computation.
`scores_historique_maxlen`	`500`	`int`	Max history length for score tracking.
`scores_test_maxlen`	`100`	`int`	Max history length for test set scores.
`lr_scheduler_epsilon_seuil`	`0.2`	`float`	Epsilon threshold below which LR scheduler activates.
`lr_scheduler_patience`	`100`	`int`	Patience for LR scheduler (in log intervals).
`lr_scheduler_factor`	`0.5`	`float`	LR reduction factor on plateau.
`lr_min`	`1e-6`	`float`	Minimum learning rate (floor for scheduler).

Do NOT change (architecture constants)

Parameter	Default	Type	Description
`input_size`	`26`	`int`	Network input size: must match the 26-feature state vector exactly.
`output_size`	`3`	`int`	Number of actions (Straight, Right, Left).
`largeur`	`640`	`int`	Game area width in pixels. Changing this changes the grid size and breaks saved models.
`hauteur`	`480`	`int`	Game area height in pixels. Same caveat.
`taille_bloc`	`20`	`int`	Cell size in pixels. Grid = `(largeur/taille_bloc) × (hauteur/taille_bloc)` = 32×24.
`taille_couche_1`	`256`	`int`	First shared hidden layer size. Changing breaks saved models.
`taille_couche_2`	`256`	`int`	Second shared hidden layer size.
`taille_couche_v`	`128`	`int`	Value stream layer size.
`taille_couche_a`	`128`	`int`	Advantage stream layer size.

UI Settings (ConfigAffichage)

Parameter	Default	Description
`largeur_fenetre`	`1920`	Window width
`hauteur_fenetre`	`1080`	Window height
`largeur_menu`	`250`	Menu width
`hauteur_barre_bas`	`40`	Status bar height
`intervalle_screenshot_defaut`	`60`	Auto-screenshot timer

Getting Started

1. Clone the Repository

git clone https://github.com/ProGen18/Snake_AI.git
cd Snake_AI

2. Set Up a Virtual Environment (Recommended)

python -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Start Training

python agent.py

This launches the full training loop with the Pygame dashboard open. The agent will start exploring randomly and gradually shift toward learned behavior as epsilon decays.

Available Scripts

Command	Description
`python agent.py`	Start training with live dashboard
`pytest`	Run the full test suite
`pytest -v`	Verbose test output
`pytest tests/test_model.py`	Run model tests only
`pytest tests/test_game.py`	Run game engine tests only
`pytest tests/test_agent.py`	Run replay buffer tests only
`pytest tests/test_widgets.py`	Run UI widget tests only

Tests

The project uses pytest. Tests cover all major components.

Test Files

File	What it tests
`test_agent.py`	`MemoireEfficace`: storage, ring-buffer overflow, circular wrapping, random sampling
`test_game.py`	`JeuVectorise`: initialization, partial reset, state computation, flood fill, danger flags, tail direction, body density
`test_model.py`	`ReseauNeurones`: forward shapes, NaN/Inf check, save/load roundtrip. `Entraineur`: learning step, soft update, loss decrease
`test_widgets.py`	`Bouton`: click/miss callbacks; `BoiteSaisie`: focus, input, Enter; `StatCard`: value set, bar mode render; `GraphiquePygame`: add points, draw empty/full, record line

Troubleshooting

Dashboard Freezes

The Pygame event loop runs inside the training loop. If batch sizes are very large or the CPU is saturated, the UI may lag. Reduce nb_environnements or taille_batch if this happens.

CUDA Warnings (`UserWarning: CUDA initialization...`)

The agent automatically falls back to CPU if no compatible GPU is found. To use a GPU, install the CUDA-enabled version of PyTorch matching your system's driver version.

Missing Module (`ModuleNotFoundError`)

Activate your virtual environment first, then run pip install -r requirements.txt.

Model Won't Load

Ensure the .pth file was saved by the same version of ReseauNeurones. Legacy saves (without optimizer state) are partially supported via a fallback loader.

Snake Doesn't Learn

Check that transitions_min_debut is not set too high: training won't start until the buffer is filled.
Increase epsilon_frames if the agent doesn't explore enough before exploitation begins.
If scores plateau early, the scheduler may reduce the learning rate too aggressively: check lr_scheduler_patience.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
model		model
screenshots		screenshots
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
README_fr.md		README_fr.md
agent.py		agent.py
config.py		config.py
dashboard.py		dashboard.py
game.py		game.py
logger.py		logger.py
model.py		model.py
requirements.txt		requirements.txt
widgets.py		widgets.py

Folders and files

Latest commit

History

Repository files navigation