Vayu (وایو) is the fastest Whisper speech-to-text implementation optimized for Apple Silicon Macs. It combines MLX Whisper with Lightning Whisper MLX batched decoding to deliver 3-5x faster transcription than standard Whisper on M1/M2/M3/M4 chips.
Named after the ancient Persian god of wind - the swiftest force in nature.
- 3-5x faster than standard Whisper via batched decoding on Apple Silicon
- One-line install -
pip install vayu-whisperand you're transcribing - All Whisper models - tiny through large-v3, plus turbo and distil variants
- Multiple output formats - txt, vtt, srt, tsv, json
- Word-level timestamps - precise word timings for subtitles and alignment
- Low memory options - 4-bit and 8-bit quantization for constrained environments
- Simple Python API + CLI - use from code or the command line
pip install vayu-whisperOr install from source:
git clone https://github.com/CodeWithBehnam/vayu.git
cd vayu
pip install -e .- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
- MLX 0.11+
from whisper_mlx import LightningWhisperMLX
# Initialize with batched decoding for maximum speed
whisper = LightningWhisperMLX(model="distil-large-v3", batch_size=12)
# Transcribe audio
result = whisper.transcribe("audio.mp3")
print(result["text"])
# With word-level timestamps
result = whisper.transcribe("audio.mp3", language="en", word_timestamps=True)from whisper_mlx import transcribe
result = transcribe(
"audio.mp3",
path_or_hf_repo="mlx-community/whisper-turbo",
batch_size=6,
language="en",
word_timestamps=True,
)
print(result["text"])
for segment in result["segments"]:
print(f"[{segment['start']:.2f} -> {segment['end']:.2f}] {segment['text']}")# Basic transcription
vayu audio.mp3
# Batched decoding (3-5x faster)
vayu audio.mp3 --batch-size 12
# Specify model and output format
vayu audio.mp3 --model mlx-community/distil-whisper-large-v3 --output-format srt
# Multiple files
vayu audio1.mp3 audio2.mp3 --output-dir ./transcripts
# Word-level timestamps
vayu audio.mp3 --word-timestamps True
# Translate to English
vayu audio.mp3 --task translate| Model | HuggingFace Repo | Size | Speed |
|---|---|---|---|
| tiny | mlx-community/whisper-tiny-mlx | 39M | Fastest |
| base | mlx-community/whisper-base-mlx | 74M | Fast |
| small | mlx-community/whisper-small-mlx | 244M | Medium |
| medium | mlx-community/whisper-medium-mlx | 769M | Slow |
| large-v3 | mlx-community/whisper-large-v3-mlx | 1.5B | Slowest |
| turbo | mlx-community/whisper-turbo | 809M | Fast |
| distil-large-v3 | mlx-community/distil-whisper-large-v3 | 756M | Fast |
For reduced memory usage, use quantized models:
whisper = LightningWhisperMLX(model="distil-large-v3", quant="4bit")| Model | Recommended batch_size | Memory Usage |
|---|---|---|
| tiny/base | 24-32 | Low |
| small | 16-24 | Medium |
| medium | 8-12 | High |
| large/turbo | 4-8 | High |
| distil-large-v3 | 12-16 | Medium |
Higher batch sizes improve throughput but require more memory. Start with the recommended values and adjust based on your hardware.
def transcribe(
audio: Union[str, np.ndarray, mx.array],
*,
path_or_hf_repo: str = "mlx-community/whisper-turbo",
batch_size: int = 1,
verbose: Optional[bool] = None,
temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
compression_ratio_threshold: Optional[float] = 2.4,
logprob_threshold: Optional[float] = -1.0,
no_speech_threshold: Optional[float] = 0.6,
condition_on_previous_text: bool = True,
initial_prompt: Optional[str] = None,
word_timestamps: bool = False,
**decode_options,
) -> dictclass LightningWhisperMLX:
def __init__(
self,
model: str = "distil-large-v3",
batch_size: int = 12,
quant: str = None,
)
def transcribe(
self,
audio_path: str,
language: str = None,
task: str = "transcribe",
verbose: bool = False,
word_timestamps: bool = False,
**kwargs,
) -> dictContributions are welcome! Please open an issue or submit a pull request.
# Development setup
git clone https://github.com/CodeWithBehnam/vayu.git
cd vayu
pip install -e ".[dev]"MIT License - see LICENSE for details.
This project builds on the excellent work of:
| Project | Author(s) | Contribution |
|---|---|---|
| mlx-examples/whisper | Apple Inc. | MLX framework, Whisper port, CLI, output writers |
| lightning-whisper-mlx | Mustafa Aljadery, Siddharth Sharma | Batched decoding for 3-5x speedup |
| Whisper | OpenAI | Original model architecture and weights |
Behnam Ebrahimi - Unified implementation, security improvements, and maintenance
