A curated collection of 100+ multimodal large language models, organized by generation paradigm — with standardized fields covering architecture, tasks, datasets, metrics, and more.
🌐 Live demo: step-out.github.io/Multimodal-Model-Zoo
Auto-Regressive • Diffusion • AR & Diffusion • Benchmarks
| Category | Count | Examples |
|---|---|---|
| Auto-Regressive | 65 | GPT-4o, Emu3, Janus, Qwen2.5-VL, LLaMA-Omni |
| Diffusion | 12 | OmniFlow, D-DiT, PackDiT, CELL-Diff, TokenFlow |
| AR & Diffusion | 36 | NExT-GPT, Show-o, BAGEL, MMaDA, MetaMorph |
- 🔍 Full-text search — search by model name, backbone, tasks, area, etc.
- 🏷️ Category filtering — Auto-Regressive / Diffusion / Hybrid
- 📅 Multi-sort — by date or name
- 📋 Detailed model cards — tokenizer, tasks, datasets, metrics, and more
- 🔗 Quick links — one-click access to paper & code
Each model entry includes:
| Field | Description |
|---|---|
| Model | Model name |
| IO | Input/output modalities |
| Tokenizer | Tokenization strategy |
| Backbone | Base architecture |
| Parameter | Model size |
| Task | Supported tasks |
| Dataset & Benchmark | Training/evaluation data |
| Metric | Evaluation metrics |
| Time | Publication date |
| Paper | Paper link |
| Code | Code repository |
| Conference/Journal | Venue |
| Area | Research area |
| Highlight | Key contribution |
Contributions are welcome! To add a new model, add an entry directly to models.json following the existing format, then open a pull request.
This project is for research and educational purposes. All model information and links belong to their respective authors.
Compiled for research reference · Star ⭐ if you find this useful!