Skip to content

Multilingual Guide

Behnam Ebrahimi edited this page Mar 29, 2026 · 1 revision

Multilingual Guide

Vayu supports 99 languages via the Whisper multilingual models.

Language Detection

By default, Vayu auto-detects the language from the first 30 seconds of audio:

result = whisper.transcribe("audio.mp3")
print(result["language"])  # e.g., "en"

For better accuracy, specify the language explicitly:

result = whisper.transcribe("audio.mp3", language="fa")

Supported Languages

Code Language Code Language Code Language
en English ja Japanese uk Ukrainian
zh Chinese pt Portuguese el Greek
de German tr Turkish ms Malay
es Spanish pl Polish cs Czech
ru Russian ca Catalan ro Romanian
ko Korean nl Dutch da Danish
fr French ar Arabic hu Hungarian
it Italian sv Swedish ta Tamil
hi Hindi id Indonesian no Norwegian
fa Persian fi Finnish th Thai
vi Vietnamese he Hebrew ur Urdu
Full list of all 99 languages
Code Language Code Language
hr Croatian bg Bulgarian
lt Lithuanian la Latin
mi Maori ml Malayalam
cy Welsh sk Slovak
te Telugu lv Latvian
bn Bengali sr Serbian
az Azerbaijani sl Slovenian
kn Kannada et Estonian
mk Macedonian br Breton
eu Basque is Icelandic
hy Armenian ne Nepali
mn Mongolian bs Bosnian
kk Kazakh sq Albanian
sw Swahili gl Galician
mr Marathi pa Punjabi
si Sinhala km Khmer
sn Shona yo Yoruba
so Somali af Afrikaans
oc Occitan ka Georgian
be Belarusian tg Tajik
sd Sindhi gu Gujarati
am Amharic yi Yiddish
lo Lao uz Uzbek
fo Faroese ht Haitian Creole
ps Pashto tk Turkmen
nn Nynorsk mt Maltese
sa Sanskrit lb Luxembourgish
my Myanmar bo Tibetan
tl Tagalog mg Malagasy
as Assamese tt Tatar
haw Hawaiian ln Lingala
ha Hausa ba Bashkir
jw Javanese su Sundanese
yue Cantonese

Language Name Aliases

Vayu accepts both language codes and common name variations:

# All equivalent
whisper.transcribe("audio.mp3", language="zh")
whisper.transcribe("audio.mp3", language="chinese")
whisper.transcribe("audio.mp3", language="mandarin")

# More aliases
# "burmese" → my, "valencian" → ca, "flemish" → nl
# "haitian" → ht, "castilian" → es, "panjabi" → pa
# "moldavian" → ro, "sinhalese" → si, "pushto" → ps

RTL Languages (Persian, Arabic, Hebrew, Urdu)

Vayu transcribes RTL languages correctly. The text output is in logical order (not visual order), which is how most text systems expect it:

whisper = LightningWhisperMLX(model="large-v3", batch_size=6)

# Persian
result = whisper.transcribe("persian_audio.mp3", language="fa")
print(result["text"])  # سلام، حال شما چطور است؟

# Arabic
result = whisper.transcribe("arabic_audio.mp3", language="ar")

Tips for RTL:

  • Use large-v3 for best accuracy on RTL languages
  • Specify the language explicitly — auto-detection sometimes confuses Arabic/Persian/Urdu
  • SRT/VTT output may need RTL markers depending on your video player

CJK Languages (Chinese, Japanese, Korean, Cantonese)

CJK languages have special word splitting behavior for word timestamps:

result = whisper.transcribe("chinese_audio.mp3", language="zh", word_timestamps=True)

# Word timestamps use character-level splitting for CJK
for word in result["segments"][0]["words"]:
    print(f"[{word['start']:.2f}] {word['word']}")

CJK, Thai, Lao, and Myanmar use unicode-boundary splitting instead of whitespace splitting, since these languages don't use spaces between words.

Translation to English

Translate any of the 99 languages to English:

# French → English
result = whisper.transcribe("french.mp3", language="fr", task="translate")

# Japanese → English
result = whisper.transcribe("japanese.mp3", language="ja", task="translate")
vayu french.mp3 --language fr --task translate

Translation uses the same model — no separate translation model is needed.

Model Selection for Multilingual

Scenario Recommended Model
English only distil-large-v3 or .en variants
Common languages (top 20) distil-large-v3
Low-resource languages large-v3 (best multilingual accuracy)
Translation to English large-v3
Language detection large-v3 (most reliable detection)

The .en model variants (e.g., tiny.en, base.en) are English-only and cannot transcribe other languages.

Multi-Language Audio

If your audio contains multiple languages (e.g., code-switching), Vayu processes it in 30-second chunks. Each chunk detects language independently when language=None:

# Auto-detect language per chunk
result = whisper.transcribe("multilingual_audio.mp3")

For better results with mixed-language audio, use large-v3 and let auto-detection handle each segment.

Clone this wiki locally