-
Notifications
You must be signed in to change notification settings - Fork 0
Multilingual Guide
Vayu supports 99 languages via the Whisper multilingual models.
By default, Vayu auto-detects the language from the first 30 seconds of audio:
result = whisper.transcribe("audio.mp3")
print(result["language"]) # e.g., "en"For better accuracy, specify the language explicitly:
result = whisper.transcribe("audio.mp3", language="fa")| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
en |
English | ja |
Japanese | uk |
Ukrainian |
zh |
Chinese | pt |
Portuguese | el |
Greek |
de |
German | tr |
Turkish | ms |
Malay |
es |
Spanish | pl |
Polish | cs |
Czech |
ru |
Russian | ca |
Catalan | ro |
Romanian |
ko |
Korean | nl |
Dutch | da |
Danish |
fr |
French | ar |
Arabic | hu |
Hungarian |
it |
Italian | sv |
Swedish | ta |
Tamil |
hi |
Hindi | id |
Indonesian | no |
Norwegian |
fa |
Persian | fi |
Finnish | th |
Thai |
vi |
Vietnamese | he |
Hebrew | ur |
Urdu |
Full list of all 99 languages
| Code | Language | Code | Language |
|---|---|---|---|
hr |
Croatian | bg |
Bulgarian |
lt |
Lithuanian | la |
Latin |
mi |
Maori | ml |
Malayalam |
cy |
Welsh | sk |
Slovak |
te |
Telugu | lv |
Latvian |
bn |
Bengali | sr |
Serbian |
az |
Azerbaijani | sl |
Slovenian |
kn |
Kannada | et |
Estonian |
mk |
Macedonian | br |
Breton |
eu |
Basque | is |
Icelandic |
hy |
Armenian | ne |
Nepali |
mn |
Mongolian | bs |
Bosnian |
kk |
Kazakh | sq |
Albanian |
sw |
Swahili | gl |
Galician |
mr |
Marathi | pa |
Punjabi |
si |
Sinhala | km |
Khmer |
sn |
Shona | yo |
Yoruba |
so |
Somali | af |
Afrikaans |
oc |
Occitan | ka |
Georgian |
be |
Belarusian | tg |
Tajik |
sd |
Sindhi | gu |
Gujarati |
am |
Amharic | yi |
Yiddish |
lo |
Lao | uz |
Uzbek |
fo |
Faroese | ht |
Haitian Creole |
ps |
Pashto | tk |
Turkmen |
nn |
Nynorsk | mt |
Maltese |
sa |
Sanskrit | lb |
Luxembourgish |
my |
Myanmar | bo |
Tibetan |
tl |
Tagalog | mg |
Malagasy |
as |
Assamese | tt |
Tatar |
haw |
Hawaiian | ln |
Lingala |
ha |
Hausa | ba |
Bashkir |
jw |
Javanese | su |
Sundanese |
yue |
Cantonese |
Vayu accepts both language codes and common name variations:
# All equivalent
whisper.transcribe("audio.mp3", language="zh")
whisper.transcribe("audio.mp3", language="chinese")
whisper.transcribe("audio.mp3", language="mandarin")
# More aliases
# "burmese" → my, "valencian" → ca, "flemish" → nl
# "haitian" → ht, "castilian" → es, "panjabi" → pa
# "moldavian" → ro, "sinhalese" → si, "pushto" → psVayu transcribes RTL languages correctly. The text output is in logical order (not visual order), which is how most text systems expect it:
whisper = LightningWhisperMLX(model="large-v3", batch_size=6)
# Persian
result = whisper.transcribe("persian_audio.mp3", language="fa")
print(result["text"]) # سلام، حال شما چطور است؟
# Arabic
result = whisper.transcribe("arabic_audio.mp3", language="ar")Tips for RTL:
- Use
large-v3for best accuracy on RTL languages - Specify the language explicitly — auto-detection sometimes confuses Arabic/Persian/Urdu
- SRT/VTT output may need RTL markers depending on your video player
CJK languages have special word splitting behavior for word timestamps:
result = whisper.transcribe("chinese_audio.mp3", language="zh", word_timestamps=True)
# Word timestamps use character-level splitting for CJK
for word in result["segments"][0]["words"]:
print(f"[{word['start']:.2f}] {word['word']}")CJK, Thai, Lao, and Myanmar use unicode-boundary splitting instead of whitespace splitting, since these languages don't use spaces between words.
Translate any of the 99 languages to English:
# French → English
result = whisper.transcribe("french.mp3", language="fr", task="translate")
# Japanese → English
result = whisper.transcribe("japanese.mp3", language="ja", task="translate")vayu french.mp3 --language fr --task translateTranslation uses the same model — no separate translation model is needed.
| Scenario | Recommended Model |
|---|---|
| English only |
distil-large-v3 or .en variants |
| Common languages (top 20) | distil-large-v3 |
| Low-resource languages |
large-v3 (best multilingual accuracy) |
| Translation to English | large-v3 |
| Language detection |
large-v3 (most reliable detection) |
The .en model variants (e.g., tiny.en, base.en) are English-only and cannot transcribe other languages.
If your audio contains multiple languages (e.g., code-switching), Vayu processes it in 30-second chunks. Each chunk detects language independently when language=None:
# Auto-detect language per chunk
result = whisper.transcribe("multilingual_audio.mp3")For better results with mixed-language audio, use large-v3 and let auto-detection handle each segment.