Skip to content

ProGen18/Snake_AI

Repository files navigation

Snake AI - Deep Q-Learning

🇫🇷 Version Française

This project implements a vectorized Dueling Deep Q-Network (DQN) to train an AI agent to play Snake. It was developed for a 3rd-year university project (SAE) at UPJV (Université de Picardie Jules Verne).

Authors: Stéphane TALAB (@ProGen18) & Mathis PIAULT (@Dova3kin)


Dashboard Interface


Table of Contents

  1. Context
  2. Features
  3. Tech Stack
  4. Architecture
  5. Neural Network
  6. State Vector (26 Features)
  7. Action Selection
  8. Memory System
  9. Dashboard
  10. Configuration
  11. Getting Started
  12. Scripts
  13. Tests
  14. Troubleshooting

Context

This project is a SAE (Situation d'Apprentissage et d'Évaluation) for the 3rd year of Computer Science at UPJV. We built a reinforcement learning pipeline from scratch: vectorized engine, neural network, training loop, and a real-time dashboard.


Features

AI & Training

  • Vectorized Environments: Runs N Snake instances in parallel using NumPy, making experience collection much faster.
  • Dueling DQN: Separates state value V(s) and action advantages A(s,a) to improve convergence.
  • Double DQN: Reduces Q-value overestimation by using a target network for evaluation.
  • N-step Returns: Multi-step returns (default n=3) for better credit assignment.
  • Epsilon-Greedy: Linear decay of exploration from 1.0 to 0.0.
  • Strategy Blending: Combines model output, a greedy heuristic, and safe-random moves based on epsilon.
  • Safe-Random Fallback: Avoids immediate wall/body collisions during random exploration.
  • Greedy Heuristic: Hand-coded logic to move toward food, used as reference early on.
  • Experience Replay: 200k-capacity FIFO buffer using NumPy arrays.
  • Apple Oversampling: A dedicated 20k buffer for food-gathering transitions to prevent the agent from "forgetting" how to eat.
  • Soft Updates: Gradual target network updates (tau=0.005) for stability.
  • LR Scheduler: ReduceLROnPlateau halves learning rate when performance stalls (active below 0.2 epsilon).
  • Gradient Clipping: Limited to max_norm=1.0.
  • Auto-Saving: Models save automatically when a new record is reached.
  • Evaluations: Periodic greedy tests (100 episodes) to track true performance.
  • Starvation Timeout: Kills the snake if it doesn't eat within a calculated timeframe.
  • Flood Fill: BFS-based dead-end detection included in the state vector.
  • Body Density: Monitors crowding around the head.
  • Lookahead: Includes 2-step danger detection.
  • Tail Sensor: Heads-to-tail vector helps the agent sense its own body shape.

Game Engine

  • Fully Vectorized: N environments advance simultaneously using NumPy (no Python loops).
  • Actions: Straight, Turn Right, Turn Left (relative to current direction).
  • Body Arrays: Positions stored in pre-allocated arrays to avoid dynamic allocation.
  • Collisions: Calculated via boolean masks and indexing over the full batch.
  • Partial Resets: Resets only the terminated environment.
  • Apple Spawning: Automatic placement on empty tiles.

Dashboard & Monitoring

  • 4-Quadrant Flexible Layout: The dashboard is divided into 4 panels that can each display: JEU (game view), COURBE (live score graph), HISTORIQUE (score scatter), or VISION (state feature bars).
  • Panel Cycling: Clicking on a panel's title cycles through the 4 available views.
  • Left Menu: 3 sections (STATS, CONTROLS, and OPTIONS).
  • 6 Live Stat Cards: Parties (games), Record, Moy100 (100-game moving average), TPS (steps/second), Temps (elapsed time), Epsilon.
  • 5 Control Buttons: Save, Load, Pause/Resume, Screenshot, Export Excel.
  • Auto-Screenshot: Configurable interval (in seconds) to capture the dashboard automatically.
  • Pygame-native Score Graph: Rolling line chart (last 2000 scores). Rendered in pure Pygame (no Matplotlib overhead).
  • Matplotlib History Scatter: Session-wide scatter plot of all scores, updated every 5 seconds.
  • State Feature Visualizer: Bar chart of all 26 input features fed to the network.
  • Save/Load Modal: File browser modal to choose save files from the model/ directory.
  • Excel Export: Full training history (timestamp, games, epsilon, record, average, TPS) exported to an .xlsx file in logs/.
  • Console Logging: Timestamped log lines printed to stdout at a configurable interval.

Tech Stack

Python PyTorch Pygame NumPy

Library Version Purpose
Python 3.10+ Language
PyTorch 2.0+ Neural network, training
Pygame 2.5+ Game rendering & dashboard UI
NumPy 1.24+ Vectorized environment & memory
Pandas 2.0+ Data handling for logging
Matplotlib 3.7+ History scatter plot
openpyxl 3.1+ Excel export
IPython 8.10+ Interactive console support

Architecture

Directory Structure

Snake_AI/
├── agent.py            # Main training loop, Agent class, replay buffer, rendering
├── config.py           # All hyperparameters (ConfigEntrainement + ConfigAffichage)
├── dashboard.py        # Pygame dashboard — 4-panel layout, menu, controls
├── game.py             # Vectorized Snake engine (N parallel environments)
├── logger.py           # Console logging + Excel export
├── model.py            # Dueling DQN network + Trainer (optimizer, target net)
├── widgets.py          # Reusable UI components (Button, InputBox, StatCard, Graph)
├── requirements.txt    # Python dependencies
├── tests/
│   ├── test_agent.py   # Tests for MemoireEfficace (replay buffer)
│   ├── test_game.py    # Tests for JeuVectorise (game engine)
│   ├── test_model.py   # Tests for ReseauNeurones + Entraineur
│   ├── test_widgets.py # Tests for all UI widget classes
│   └── test_dashboard.py
├── model/              # Saved model checkpoints (.pth)
├── logs/               # Excel training logs (.xlsx)
├── screenshots/        # Auto-captured screenshots (.png)
├── src/
└──   └── demoSnake.gif   # Demo animation (used in README)

Module Responsibilities

Module Class(es) Responsibility
agent.py MemoireEfficace, RenduPygame, AgentIA Training loop, replay buffer, Pygame render
game.py JeuVectorise N-environment Snake engine
model.py ReseauNeurones, Entraineur Network definition, optimizer, target net
dashboard.py Dashboard Full Pygame UI orchestration
widgets.py Bouton, BoiteSaisie, StatCard, GraphiquePygame Reusable UI components
logger.py JournalDeBord Timestamped logs + Excel export
config.py ConfigEntrainement, ConfigAffichage All constants and hyperparameters

Neural Network

The model is a Dueling DQN (ReseauNeurones in model.py).

Input: 26 features
        │
   Linear(26 → 256) + LayerNorm + ReLU
        │
   Linear(256 → 256) + LayerNorm + ReLU
        │
   ┌────┴────┐
   │         │
Value      Advantage
stream     stream
   │         │
Lin(256→128) Lin(256→128)
   + ReLU      + ReLU
   │         │
Lin(128→1)  Lin(128→3)
   │         │
   └────┬────┘
        │
  Q(s,a) = V(s) + (A(s,a) − mean_a(A(s,a)))

Output: 3 Q-values → argmax = chosen action

Training algorithm:

  • Double DQN: action* = argmax_a Q_main(s', a) → evaluated as Q_target(s', action*)
  • Bellman target: Q_target = R + γ^n × Q_next × (1 − done)
  • Loss: SmoothL1Loss (Huber)
  • Optimizer: Adam
  • Soft update: θ_target ← τ × θ_main + (1−τ) × θ_target after every step

Checkpoint format (.pth files):

{
    "model_state": ...,
    "optimizer_state": ...,
    "nb_parties": int,
    "temps_total": float,
    "epsilon": float,
    "record": int
}

State Vector

The network receives a vector of 26 normalized floats:

Index Feature(s) Description
0 dist_pomme Manhattan distance to food, normalized by grid diagonal
1–2 dir_x, dir_y Signed direction to food (x and y), normalized
3–5 danger_straight, danger_right, danger_left 1.0 if moving in that direction causes immediate collision with wall or body, else 0.0
6 faim Steps since last food / famine timeout — hunger level (0.0 to 1.0)
7–8 pos_x, pos_y Head position on the grid, normalized by grid width/height
9–11 flood_straight, flood_right, flood_left BFS-accessible cells for each action, normalized by total grid size
12–15 dir_right, dir_down, dir_left, dir_up One-hot encoding of the snake's current absolute direction
16 longueur Snake length / max possible length
17–19 mur_straight, mur_right, mur_left Normalized distance to wall in each relative direction
20–21 queue_x, queue_y Normalized vector from head to tail
22–24 danger2_straight, danger2_right, danger2_left 1.0 if a collision would occur 2 steps ahead
25 densite_corps Ratio of body cells within a densite_rayon-cell radius of the head

Action Selection

At each step, we use a blended strategy:

With probability (1 − epsilon):
    → Model-based:  argmax Q(state)

With probability epsilon:
    → With decreasing probability: Greedy Heuristic (move toward food, safe fallback)
    → Otherwise: Safe Random (random action among collision-free moves)

The greedy heuristic and safe-random strategies both avoid moves that result in immediate death whenever a safe option exists.


Memory System

Main Replay Buffer (MemoireEfficace)

Property Value
Capacity 200,000 transitions
Structure Pre-allocated NumPy ring buffer (FIFO)
Stores (state, action, reward, next_state, done)
Sampling Uniform random
Batch size 256

Apple Memory (memoire_pommes)

Property Value
Capacity 20,000 transitions
Stores Only transitions where food was collected
Purpose Oversample rare reward events
Threshold Activated when apple memory has ≥ 128 entries
Samples per batch 128 additional samples mixed with main batch

Dashboard

Views: JEU, COURBE, HISTORIQUE, VISION. Switch panels by clicking their titles.

Left Menu

STATS (Live Cards)

Card Description
Parties Total games
Record Best score
Moy100 100-game average
TPS Steps per second
Temps Duration
Epsilon Exploration rate

CONTROLS

Button Key Action
Sauvegarder S Save model
Charger L Load model
Pause / Resume Space Toggle training
Capture C Dashboard screenshot
Export Excel E Training history log

OPTIONS section

Option Default Description
Auto-screenshots toggle Off Automatically captures screenshots at fixed interval
Interval input box 60 seconds Time between auto-screenshots when enabled

Footer

Button Action
Quitter Closes the dashboard and stops training

Panels

Panel Content Update rate
JEU Pygame render of environment 0 — snake with color gradient (head brighter), red apple Every frame
COURBE Pygame line chart — raw scores (blue), moving average (orange), record (gold dashed) Every frame
HISTORIQUE Matplotlib scatter of all session scores Every ~5 seconds
VISION Vertical bar chart of all 26 input features Every frame

Widgets (widgets.py)

Bouton

  • Rectangular button with hover color transition.
  • Click detected via rect collision.
  • Supports dynamic text via setter.
  • Callback function triggered on click.

BoiteSaisie

  • Text input field with focus management.
  • Click to activate, click away to deactivate.
  • Keyboard input with backspace support.
  • Returns text on Enter key press.

StatCard

  • Fixed height: 52px.
  • Displays a label and a large value.
  • Optional bar mode: shows a progress bar with gradient from red to green.
  • Render is safe: defaults to "N/A" if the value is missing.

GraphiquePygame

  • Rolling buffer: last 2000 scores (deque with maxlen).
  • Draws axes, grid lines, and a legend.
  • Auto-scales Y axis to data range.
  • Record line drawn as a dashed gold horizontal line.

Configuration

Defined in config.py.

Training Settings (ConfigEntrainement)


Recommended to change for experiments

Parameter Default Type Description
graine 42 int Random seed. Change to reproduce different training runs.
nb_environnements 20 int Number of parallel game instances. More = faster experience collection, higher RAM/CPU usage.
taux_apprentissage 0.0003 float Adam optimizer learning rate. Try 0.001 for faster (but less stable) learning.
gamma 0.97 float Future reward discount factor. Lower = more short-sighted agent.
epsilon_depart 1.0 float Initial exploration rate. Usually keep at 1.0 for a fresh start.
epsilon_fin 0.0 float Final exploration rate. Can be set to 0.05 to always keep some exploration.
epsilon_frames 1000 int Number of training steps to decay epsilon from start to end. Increase for longer exploration.
transitions_min_debut 5000 int Minimum replay buffer size before training starts. Reduce to 1000 for quick tests.
eval_intervalle 5000 int Steps between greedy evaluation runs. Lower to see test scores more frequently.
nb_episodes_test 100 int Number of greedy episodes per evaluation. Reduce for speed.
recompense_pomme 1.0 float Reward for eating food. Increase to make food more important.
recompense_mort -1.0 float Penalty for dying. Increase magnitude to punish deaths more.
recompense_step -0.001 float Small penalty per step (encourages efficiency). Set to 0 to remove time pressure.
famine_base 100 int Base number of steps before starvation (without food).
famine_par_case 3 int Additional steps per snake body block before starvation.
taille_batch 256 int Batch size for main training. Larger = smoother gradients, more memory.
memoire_max 200000 int Replay buffer capacity. Reduce to 50000 for lower memory usage.
log_intervalle_sec 1.0 float Console log interval in seconds. Set to 5.0 for less noise.
graph_update_intervalle 100 int Dashboard graph update frequency (every N steps).

Change only if you understand the impact

Parameter Default Type Description
tau 0.005 float Target network soft update rate. Too high = unstable, too low = slow to learn.
n_step 3 int N-step return window. Affects how far back rewards are propagated.
blend_frames 150 int Steps for blending between heuristic and model-based exploration.
freq_entrainement 8 int Training frequency: one gradient step every N environment steps.
taille_batch_pqn 512 int Batch size for PQN-style training step.
memoire_pommes_capacite 20000 int Apple memory buffer capacity.
memoire_pommes_seuil 128 int Minimum apple memory size before oversampling is active.
memoire_pommes_echantillons 128 int Apple samples added per training batch.
flood_depth_facteur 0.5 float Flood fill BFS depth limit = facteur × snake_length.
flood_depth_max 50 int Hard cap on flood fill depth.
densite_rayon 3 int Radius (in cells) for body density computation.
scores_historique_maxlen 500 int Max history length for score tracking.
scores_test_maxlen 100 int Max history length for test set scores.
lr_scheduler_epsilon_seuil 0.2 float Epsilon threshold below which LR scheduler activates.
lr_scheduler_patience 100 int Patience for LR scheduler (in log intervals).
lr_scheduler_factor 0.5 float LR reduction factor on plateau.
lr_min 1e-6 float Minimum learning rate (floor for scheduler).

Do NOT change (architecture constants)

Parameter Default Type Description
input_size 26 int Network input size: must match the 26-feature state vector exactly.
output_size 3 int Number of actions (Straight, Right, Left).
largeur 640 int Game area width in pixels. Changing this changes the grid size and breaks saved models.
hauteur 480 int Game area height in pixels. Same caveat.
taille_bloc 20 int Cell size in pixels. Grid = (largeur/taille_bloc) × (hauteur/taille_bloc) = 32×24.
taille_couche_1 256 int First shared hidden layer size. Changing breaks saved models.
taille_couche_2 256 int Second shared hidden layer size.
taille_couche_v 128 int Value stream layer size.
taille_couche_a 128 int Advantage stream layer size.

UI Settings (ConfigAffichage)

Parameter Default Description
largeur_fenetre 1920 Window width
hauteur_fenetre 1080 Window height
largeur_menu 250 Menu width
hauteur_barre_bas 40 Status bar height
intervalle_screenshot_defaut 60 Auto-screenshot timer

Getting Started

1. Clone the Repository

git clone https://github.com/ProGen18/Snake_AI.git
cd Snake_AI

2. Set Up a Virtual Environment (Recommended)

python -m venv venv
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Start Training

python agent.py

This launches the full training loop with the Pygame dashboard open. The agent will start exploring randomly and gradually shift toward learned behavior as epsilon decays.


Available Scripts

Command Description
python agent.py Start training with live dashboard
pytest Run the full test suite
pytest -v Verbose test output
pytest tests/test_model.py Run model tests only
pytest tests/test_game.py Run game engine tests only
pytest tests/test_agent.py Run replay buffer tests only
pytest tests/test_widgets.py Run UI widget tests only

Tests

The project uses pytest. Tests cover all major components.

Test Files

File What it tests
test_agent.py MemoireEfficace: storage, ring-buffer overflow, circular wrapping, random sampling
test_game.py JeuVectorise: initialization, partial reset, state computation, flood fill, danger flags, tail direction, body density
test_model.py ReseauNeurones: forward shapes, NaN/Inf check, save/load roundtrip. Entraineur: learning step, soft update, loss decrease
test_widgets.py Bouton: click/miss callbacks; BoiteSaisie: focus, input, Enter; StatCard: value set, bar mode render; GraphiquePygame: add points, draw empty/full, record line

Troubleshooting

Dashboard Freezes

The Pygame event loop runs inside the training loop. If batch sizes are very large or the CPU is saturated, the UI may lag. Reduce nb_environnements or taille_batch if this happens.

CUDA Warnings (UserWarning: CUDA initialization...)

The agent automatically falls back to CPU if no compatible GPU is found. To use a GPU, install the CUDA-enabled version of PyTorch matching your system's driver version.

Missing Module (ModuleNotFoundError)

Activate your virtual environment first, then run pip install -r requirements.txt.

Model Won't Load

Ensure the .pth file was saved by the same version of ReseauNeurones. Legacy saves (without optimizer state) are partially supported via a fallback loader.

Snake Doesn't Learn

  • Check that transitions_min_debut is not set too high: training won't start until the buffer is filled.
  • Increase epsilon_frames if the agent doesn't explore enough before exploitation begins.
  • If scores plateau early, the scheduler may reduce the learning rate too aggressively: check lr_scheduler_patience.

About

A high-performance, vectorized Snake AI using Deep Q-Learning (DQN) with a real-time Pygame training dashboard and PyTorch.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages