Voice dictation for Linux. Record, transcribe, and paste text into any application – all from a single keyboard shortcut.
talk-rs captures audio from your microphone, sends it to a
transcription API (Mistral Voxtral or OpenAI Whisper), and types the
result into the focused window. A small X11 overlay badge shows the
current state (recording / transcribing) so you always know what is
happening.
- Dictation workflow – press a key to start recording, press again to stop; transcribed text is pasted automatically.
- Multiple providers – Mistral (Voxtral) and OpenAI (Whisper / GPT-4o) for batch transcription; Mistral and OpenAI realtime streaming via WebSocket.
- Speaker diarization – identify who is speaking (
--diarize); output is tagged with speaker labels. Currently supported with Mistral V2 models in batch mode. - Multi-candidate picker – with
--pick, run several providers in parallel and choose the best transcription from a GTK picker window. A waterfall spectrogram of the recording is shown above the candidate list; it loads asynchronously and adapts to window width. - Visual overlay – non-intrusive X11 badge at the top of the screen (works without a compositor).
- Dead audio detection – overlay badge shows a red prohibit icon and “NO SOUND” warning when no real microphone is detected (e.g. headset unplugged); a notification also appears on the text panel.
- Auto-pause – automatically pauses audio forwarding during
silence, trimming dead air from transcription input. Resumes
instantly with a 300 ms lookback buffer to preserve speech onset.
The badge shows yellow pause bars and “LISTENING” during pauses.
Disable with
--no-auto-pause. - Audio visualizers – optional in-badge visualization during
recording (
--viz waterfall,--viz amplitude,--viz spectrum); monochrome mode with--mono. - Audio feedback – start/stop tones and periodic boop during recording; fully configurable or disabled.
- Context bias – supply domain-specific vocabulary to improve transcription accuracy.
- Daemon toggle mode – first invocation starts recording, second stops and transcribes; ideal for global shortcuts.
- Retry last – re-transcribe the last cached recording without
speaking again (
--retry-last). - Recordings browser –
record --uiopens a GTK4 window listing all recordings (OGG fromoutput_dir) and dictation cache (WAV). Play, delete, or open in the file manager; sections auto-refresh via inotify when files change externally. - Standalone commands –
recordandtranscribecan be used independently for scripting. - Environment overrides – every config value can be set via
TALK_RS_*environment variables.
# Debian / Ubuntu
sudo apt install build-essential pkg-config libasound2-dev libopus-dev \
libgtk-4-dev libpipewire-0.3-dev libspa-0.2-dev libclang-dev
# Fedora
sudo dnf install alsa-lib-devel opus-devel pkg-config \
gtk4-devel pipewire-devel spa-devel clang-develA working Rust toolchain is required (1.70+). Install via rustup if needed.
PipeWire must be running (used for audio capture). Most modern Linux
desktops ship with PipeWire by default.
An API key for at least one transcription provider is required:
If building from a git clone, run ./autogen.sh first to resolve
version placeholders in Cargo.toml:
./autogen.sh
cargo build --releaseThe binary is at target/release/talk-rs. Copy it somewhere in your
$PATH:
cp target/release/talk-rs ~/.local/bin/talk-rs reads $XDG_CONFIG_HOME/talk-rs/config.yaml (typically
~~/.config/talk-rs/config.yaml~).
Copy the example and fill in your values:
mkdir -p ~/.config/talk-rs
cp config.example.yaml ~/.config/talk-rs/config.yamlMinimal working configuration:
output_dir: ~/talk-rs-output
providers:
mistral:
api_key: YOUR_MISTRAL_API_KEY| Field | Description |
|---|---|
output_dir | Absolute path to a writable directory for recordings |
providers.mistral.api_key | Mistral API key (if using Mistral) |
providers.openai.api_key | OpenAI API key (if using OpenAI) |
| Field | Default | Description |
|---|---|---|
providers.mistral.url | https://api.mistral.ai | Mistral API base URL |
providers.mistral.model | voxtral-mini-2507 | Mistral transcription model |
providers.mistral.context_bias | none | Comma-separated words for accuracy |
providers.openai.url | https://api.openai.com | OpenAI API base URL |
providers.openai.model | whisper-1 | OpenAI batch model |
providers.openai.realtime_model | gpt-4o-mini-transcribe | OpenAI realtime model |
transcription.default_provider | mistral | Default provider when unspecified |
indicators.boop_interval_ms | 5000 | Periodic boop interval in ms (0 disables boops; also --no-boop) |
indicators.visual_overlay | true | Show X11 overlay badge |
indicators.viz | none | In-badge visualizer: waterfall, amplitude, or spectrum (also --viz; env TALK_RS_INDICATORS_VIZ) |
indicators.mono | false | Monochrome visualizer (also --mono) |
paste.chunk_chars | 150 | Max chars per paste chunk (0 disables chunking; also --no-chunk-paste) |
Every config value can be overridden via environment variables:
export TALK_RS_PROVIDERS_MISTRAL_API_KEY="sk-..."
export TALK_RS_PROVIDERS_OPENAI_API_KEY="sk-..."See config.example.yaml for the full list.
| Flag | Effect |
|---|---|
-v | Increase logging verbosity (-vv debug, -vvv trace) |
Record, transcribe, and paste into the focused application:
talk-rs dictateToggle mode (ideal for keyboard shortcuts):
talk-rs dictate --toggleFirst call starts a background daemon that records. Second call stops recording, transcribes, and pastes the result.
Options:
| Flag | Effect |
|---|---|
--toggle | Daemon toggle mode |
--provider | Choose mistral or openai |
--model | Override model for this invocation |
--diarize | Enable speaker diarization (batch mode only) |
--realtime | Stream audio via WebSocket (incremental text) |
--pick | Show multi-candidate picker (GTK window) |
--retry-last | Re-transcribe the last cached recording |
--replace-last-paste | Delete previous paste before inserting new text |
--save <PATH> | Save audio recording to a file |
--output-yaml <FILE> | Write transcription metadata YAML |
--input-audio-file <FILE> | Feed a pre-recorded WAV instead of live mic |
--monitor | Mix system audio (monitor) with mic input |
--no-sounds | Disable audio indicators |
--no-boop | Disable periodic boop sounds (keep start/stop) |
--no-chunk-paste | Paste all text in one shot (disable chunking) |
--no-overlay | Disable visual overlay |
--no-auto-pause | Disable auto-pause during silence (forward all audio) |
--viz <MODE> | In-badge visualizer: waterfall, amplitude, or spectrum |
--mono | Monochrome visualizer (theme-aware) |
Capture audio to an OGG/Opus file:
talk-rs record # auto-named memo-YYYY-MM-DD-HH-MM-SS.ogg
talk-rs record meeting-notes.ogg # custom filenameOptions:
| Flag | Effect |
|---|---|
--monitor | Mix system audio (monitor) with microphone input |
--ui | Open GTK4 recordings browser (play, delete, open folder) |
Transcribe an existing audio file:
talk-rs transcribe recording.ogg # print to stdout
talk-rs transcribe recording.ogg output.txt # write to file
talk-rs transcribe recording.ogg --provider openaiOptions:
| Flag | Effect |
|---|---|
--provider | Choose mistral or openai |
--model | Override model for this invocation |
--diarize | Enable speaker diarization (tag by speaker) |
Bind talk-rs dictate --toggle to a key (e.g. Super+/):
BASE="org.gnome.settings-daemon.plugins.media-keys"
BPATH="/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings"
CURRENT=$(gsettings get "$BASE" custom-keybindings)
N=0
while echo "$CURRENT" | grep -q "custom${N}/"; do
N=$((N + 1))
done
SLOT="${BPATH}/custom${N}/"
SCHEMA="${BASE}.custom-keybinding:${SLOT}"
if [ "$CURRENT" = "@as []" ]; then
gsettings set "$BASE" custom-keybindings "['${SLOT}']"
else
gsettings set "$BASE" custom-keybindings "$(echo "$CURRENT" | sed "s|]$|, '${SLOT}']|")"
fi
gsettings set "$SCHEMA" name 'talk-rs dictate'
gsettings set "$SCHEMA" command 'talk-rs dictate --toggle --viz waterfall'
gsettings set "$SCHEMA" binding '<Super>slash'First press starts recording, second press stops, transcribes, and pastes into the focused application.
To change the key, replace <Super>slash with the desired binding
(e.g. <Super>semicolon, <Super>d). Add --realtime to use
streaming transcription instead of batch mode.
cargo fmt # format
cargo clippy --all-targets # lint
cargo test # test
cargo build # buildMIT – see LICENSE.