Adversarial Co-Evolution of RL and VLM/LLM Agents in Multiplayer Games

📖 Overview

This project explores the intersection of Reinforcement Learning (RL) and Large Language Models (LLMs) in complex, imperfect-information environments (Gin Rummy). It addresses the challenge of training RL agents without reliable opponents or expensive human feedback by establishing an adversarial co-evolutionary loop.

We utilize LLMs (Llama 3, Gemma, GPT) as zero-shot strategic opponents to guide the training of efficient PPO agents. The system employs a 3-phase curriculum learning approach to distill the broad, "common-sense" strategic knowledge of LLMs into a fast, compact RL policy.

🚀 Key Features & Achievements

High-Performance RL Pipeline: Engineered a high-throughput, 64-96 core, multi-process PPO training pipeline with a custom action-masked policy using Stable Baselines 3 and PyTorch.
Curriculum Learning System: Built a robust 3-phase curriculum learning system (Random → Self-Play → Adversarial) with a fully cached RAM model-pool API, achieving a 99.12% win rate vs. baseline agents.
LLM Knowledge Distillation: Architected a scalable API framework to integrate LLM strategic insights as a policy prior, enabling agents to learn from models like Llama 3 and GPT-OSS via Ollama/HuggingFace.
Interactive Evaluation Suite: Designed and built a custom evaluation environment (PettingZoo) and Web UI for critical live human-vs-agent testing and qualitative validation of learned strategies.

🛠️ System Architecture

The project consists of three main components:

The RL Agent (PPO): A custom implementation of Proximal Policy Optimization with valid action masking, trained to handle the partial observability of Gin Rummy.
The LLM Agent: A sophisticated wrapper that parses game states into text prompts (Chain-of-Thought) and parses LLM responses back into valid game actions.
The Orchestrator: Manages the training curriculum, switching opponents between random agents, prior model checkpoints, and live LLM inferences based on training progress.

📂 Project Structure

.
├── agents/                 # Agent implementations (PPO, Random, LLM, Human)
├── artifacts/              # Trained models and checkpoints
├── config/                 # Configuration files (paths, prompts.yaml)
├── controller/             # Game logic and orchestration
├── game/                   # Gin Rummy environment wrappers and assets
├── llm/                    # API handlers for Ollama/HuggingFace interaction
├── src/                    # Utilities, logging, and UI components
├── templates/              # HTML templates for the Web UI
├── app.py                  # Flask application for web-based play
├── eval.py                 # Evaluation scripts
├── main.py                 # Main entry point
├── ppo_train.py            # PPO training pipeline script
├── environment.yml         # Conda environment definition
└── requirements.txt        # Python dependencies

💻 Installation

Prerequisites

Python 3.10+
Conda (recommended)
Ollama (for local LLM inference)

Setup

Clone the repository:

git clone [https://github.com/nikelroid/adversarial-coevolution.git](https://github.com/nikelroid/adversarial-coevolution.git)
cd adversarial-coevolution

Create the environment:

conda env create -f environment.yml
conda activate rl-llm-env

Alternatively, using pip:

pip install -r requirements.txt
pip install -e .

🏃‍♂️ Usage

1. Training the RL Agent

To start the PPO training pipeline with the default configuration (Curriculum Phase 1 & 2):

python ppo_train.py

Check config/ to adjust hyperparameters or curriculum stages.

2. Running the LLM Interaction

Ensure your Ollama server is running (default port 11434). To test an LLM agent:

python llm_test.py

3. Web Interface (Human vs. Agent)

Launch the web application to play against the trained models:

python app.py

Open your browser at http://localhost:5000.

4. Evaluation

To benchmark the current model against a random agent or an LLM:

python eval.py --model artifacts/models/ppo_gin_rummy/ppo_gin_rummy_final.zip

📊 Results

Agent Type	Opponent	Win Rate	Notes
PPO (Baseline)	Random	98.9%	High win rate, but prone to local optima (Gin-biased).
PPO (Curriculum)	Random	99.1%	Balanced strategy (Knock vs. Gin).
GPT-OSS (20B)	Random	100%	Zero-shot performance (5-0 match).
GPT-OSS (20B)	PPO (Knock)	60%	Competitive match (3-2 score).

👥 Authors

Nima Kelidari - Lead Engineer & RL Architecture - kelidari@usc.edu
Mahdi Salmani - LLM Integration & Evaluation - salmanis@usc.edu
Mohammadsaeed Haghi - Game Environment & API - haghim@usc.edu

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

PettingZoo for the Multi-Agent RL environments.
Stable-Baselines3 for reliable PPO implementations.
RLCard for game logic inspiration.

Name		Name	Last commit message	Last commit date
Latest commit History 236 Commits
agents		agents
artifacts		artifacts
build		build
config		config
controller		controller
game		game
llm		llm
notebook		notebook
pipeline		pipeline
src		src
templates		templates
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
Readme.md		Readme.md
adverserial-coev.png		adverserial-coev.png
app.py		app.py
curriculum_manager.py		curriculum_manager.py
environment.yml		environment.yml
eval.py		eval.py
get_wandb_data.py		get_wandb_data.py
gym_wrapper.py		gym_wrapper.py
hand_scoring.py		hand_scoring.py
llm_test.py		llm_test.py
main.py		main.py
misc.py		misc.py
model_server.py		model_server.py
ppo_train.py		ppo_train.py
requirements.txt		requirements.txt
setup.py		setup.py
test_curriculum_setup.py		test_curriculum_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial Co-Evolution of RL and VLM/LLM Agents in Multiplayer Games

📖 Overview

🚀 Key Features & Achievements

🛠️ System Architecture

📂 Project Structure

💻 Installation

Prerequisites

Setup

🏃‍♂️ Usage

1. Training the RL Agent

2. Running the LLM Interaction

3. Web Interface (Human vs. Agent)

4. Evaluation

📊 Results

👥 Authors

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adversarial Co-Evolution of RL and VLM/LLM Agents in Multiplayer Games

📖 Overview

🚀 Key Features & Achievements

🛠️ System Architecture

📂 Project Structure

💻 Installation

Prerequisites

Setup

🏃‍♂️ Usage

1. Training the RL Agent

2. Running the LLM Interaction

3. Web Interface (Human vs. Agent)

4. Evaluation

📊 Results

👥 Authors

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages