Skip to content

step-out/Multimodal-Model-Zoo

Repository files navigation

🧠 Multimodal Model Zoo

A curated collection of 100+ multimodal large language models, organized by generation paradigm — with standardized fields covering architecture, tasks, datasets, metrics, and more.

🌐 Live demo: step-out.github.io/Multimodal-Model-Zoo

Auto-RegressiveDiffusionAR & DiffusionBenchmarks


📊 Overview

Category Count Examples
Auto-Regressive 65 GPT-4o, Emu3, Janus, Qwen2.5-VL, LLaMA-Omni
Diffusion 12 OmniFlow, D-DiT, PackDiT, CELL-Diff, TokenFlow
AR & Diffusion 36 NExT-GPT, Show-o, BAGEL, MMaDA, MetaMorph

✨ Features

  • 🔍 Full-text search — search by model name, backbone, tasks, area, etc.
  • 🏷️ Category filtering — Auto-Regressive / Diffusion / Hybrid
  • 📅 Multi-sort — by date or name
  • 📋 Detailed model cards — tokenizer, tasks, datasets, metrics, and more
  • 🔗 Quick links — one-click access to paper & code

📝 Data Fields

Each model entry includes:

Field Description
Model Model name
IO Input/output modalities
Tokenizer Tokenization strategy
Backbone Base architecture
Parameter Model size
Task Supported tasks
Dataset & Benchmark Training/evaluation data
Metric Evaluation metrics
Time Publication date
Paper Paper link
Code Code repository
Conference/Journal Venue
Area Research area
Highlight Key contribution

🤝 Contributing

Contributions are welcome! To add a new model, add an entry directly to models.json following the existing format, then open a pull request.

📄 License

This project is for research and educational purposes. All model information and links belong to their respective authors.


Compiled for research reference · Star ⭐ if you find this useful!

About

A curated collection of 100+ multimodal large language models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors