talk-rs

Voice dictation for Linux. Record, transcribe, and paste text into any application – all from a single keyboard shortcut.

talk-rs captures audio from your microphone, sends it to a transcription API (Mistral Voxtral or OpenAI Whisper), and types the result into the focused window. A small X11 overlay badge shows the current state (recording / transcribing) so you always know what is happening.

Features

Dictation workflow – press a key to start recording, press again to stop; transcribed text is pasted automatically.
Multiple providers – Mistral (Voxtral) and OpenAI (Whisper / GPT-4o) for batch transcription; Mistral and OpenAI realtime streaming via WebSocket.
Speaker diarization – identify who is speaking (--diarize); output is tagged with speaker labels. Currently supported with Mistral V2 models in batch mode.
Multi-candidate picker – with --pick, run several providers in parallel and choose the best transcription from a GTK picker window. A waterfall spectrogram of the recording is shown above the candidate list; it loads asynchronously and adapts to window width.
Visual overlay – non-intrusive X11 badge at the top of the screen (works without a compositor).
Dead audio detection – overlay badge shows a red prohibit icon and “NO SOUND” warning when no real microphone is detected (e.g. headset unplugged); a notification also appears on the text panel.
Auto-pause – automatically pauses audio forwarding during silence, trimming dead air from transcription input. Resumes instantly with a 300 ms lookback buffer to preserve speech onset. The badge shows yellow pause bars and “LISTENING” during pauses. Disable with --no-auto-pause.
Audio visualizers – optional in-badge visualization during recording (--viz waterfall, --viz amplitude, --viz spectrum); monochrome mode with --mono.
Audio feedback – start/stop tones and periodic boop during recording; fully configurable or disabled.
Context bias – supply domain-specific vocabulary to improve transcription accuracy.
Daemon toggle mode – first invocation starts recording, second stops and transcribes; ideal for global shortcuts.
Retry last – re-transcribe the last cached recording without speaking again (--retry-last).
Recordings browser – record --ui opens a GTK4 window listing all recordings (OGG from output_dir) and dictation cache (WAV). Play, delete, or open in the file manager; sections auto-refresh via inotify when files change externally.
Standalone commands – record and transcribe can be used independently for scripting.
Environment overrides – every config value can be set via TALK_RS_* environment variables.

Prerequisites

Build dependencies

# Debian / Ubuntu
sudo apt install build-essential pkg-config libasound2-dev libopus-dev \
  libgtk-4-dev libpipewire-0.3-dev libspa-0.2-dev libclang-dev

# Fedora
sudo dnf install alsa-lib-devel opus-devel pkg-config \
  gtk4-devel pipewire-devel spa-devel clang-devel

A working Rust toolchain is required (1.70+). Install via rustup if needed.

Runtime dependencies

PipeWire must be running (used for audio capture). Most modern Linux desktops ship with PipeWire by default.

An API key for at least one transcription provider is required:

Mistral (default)
OpenAI

Installation

If building from a git clone, run ./autogen.sh first to resolve version placeholders in Cargo.toml:

./autogen.sh
cargo build --release

The binary is at target/release/talk-rs. Copy it somewhere in your $PATH:

cp target/release/talk-rs ~/.local/bin/

Configuration

talk-rs reads $XDG_CONFIG_HOME/talk-rs/config.yaml (typically ~~/.config/talk-rs/config.yaml~).

Copy the example and fill in your values:

mkdir -p ~/.config/talk-rs
cp config.example.yaml ~/.config/talk-rs/config.yaml

Minimal working configuration:

output_dir: ~/talk-rs-output

providers:
  mistral:
    api_key: YOUR_MISTRAL_API_KEY

Required fields

Field	Description
`output_dir`	Absolute path to a writable directory for recordings
`providers.mistral.api_key`	Mistral API key (if using Mistral)
`providers.openai.api_key`	OpenAI API key (if using OpenAI)

Optional fields

Field	Default	Description
`providers.mistral.url`	`https://api.mistral.ai`	Mistral API base URL
`providers.mistral.model`	`voxtral-mini-2507`	Mistral transcription model
`providers.mistral.context_bias`	none	Comma-separated words for accuracy
`providers.openai.url`	`https://api.openai.com`	OpenAI API base URL
`providers.openai.model`	`whisper-1`	OpenAI batch model
`providers.openai.realtime_model`	`gpt-4o-mini-transcribe`	OpenAI realtime model
`transcription.default_provider`	`mistral`	Default provider when unspecified
`indicators.boop_interval_ms`	`5000`	Periodic boop interval in ms (`0` disables boops; also `--no-boop`)
`indicators.visual_overlay`	`true`	Show X11 overlay badge
`indicators.viz`	none	In-badge visualizer: `waterfall`, `amplitude`, or `spectrum` (also `--viz`; env `TALK_RS_INDICATORS_VIZ`)
`indicators.mono`	`false`	Monochrome visualizer (also `--mono`)
`paste.chunk_chars`	`150`	Max chars per paste chunk (`0` disables chunking; also `--no-chunk-paste`)

Environment overrides

Every config value can be overridden via environment variables:

export TALK_RS_PROVIDERS_MISTRAL_API_KEY="sk-..."
export TALK_RS_PROVIDERS_OPENAI_API_KEY="sk-..."

See config.example.yaml for the full list.

Usage

Global options

Flag	Effect
`-v`	Increase logging verbosity (`-vv` debug, `-vvv` trace)

Dictate (main workflow)

Record, transcribe, and paste into the focused application:

talk-rs dictate

Toggle mode (ideal for keyboard shortcuts):

talk-rs dictate --toggle

First call starts a background daemon that records. Second call stops recording, transcribes, and pastes the result.

Options:

Flag	Effect
`--toggle`	Daemon toggle mode
`--provider`	Choose `mistral` or `openai`
`--model`	Override model for this invocation
`--diarize`	Enable speaker diarization (batch mode only)
`--realtime`	Stream audio via WebSocket (incremental text)
`--pick`	Show multi-candidate picker (GTK window)
`--retry-last`	Re-transcribe the last cached recording
`--replace-last-paste`	Delete previous paste before inserting new text
`--save <PATH>`	Save audio recording to a file
`--output-yaml <FILE>`	Write transcription metadata YAML
`--input-audio-file <FILE>`	Feed a pre-recorded WAV instead of live mic
`--monitor`	Mix system audio (monitor) with mic input
`--no-sounds`	Disable audio indicators
`--no-boop`	Disable periodic boop sounds (keep start/stop)
`--no-chunk-paste`	Paste all text in one shot (disable chunking)
`--no-overlay`	Disable visual overlay
`--no-auto-pause`	Disable auto-pause during silence (forward all audio)
`--viz <MODE>`	In-badge visualizer: `waterfall`, `amplitude`, or `spectrum`
`--mono`	Monochrome visualizer (theme-aware)

Record

Capture audio to an OGG/Opus file:

talk-rs record                       # auto-named memo-YYYY-MM-DD-HH-MM-SS.ogg
talk-rs record meeting-notes.ogg     # custom filename

Options:

Flag	Effect
`--monitor`	Mix system audio (monitor) with microphone input
`--ui`	Open GTK4 recordings browser (play, delete, open folder)

Transcribe

Transcribe an existing audio file:

talk-rs transcribe recording.ogg                # print to stdout
talk-rs transcribe recording.ogg output.txt     # write to file
talk-rs transcribe recording.ogg --provider openai

Options:

Flag	Effect
`--provider`	Choose `mistral` or `openai`
`--model`	Override model for this invocation
`--diarize`	Enable speaker diarization (tag by speaker)

GNOME keyboard shortcut

Bind talk-rs dictate --toggle to a key (e.g. Super+/):

BASE="org.gnome.settings-daemon.plugins.media-keys"
BPATH="/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings"

CURRENT=$(gsettings get "$BASE" custom-keybindings)
N=0
while echo "$CURRENT" | grep -q "custom${N}/"; do
  N=$((N + 1))
done
SLOT="${BPATH}/custom${N}/"
SCHEMA="${BASE}.custom-keybinding:${SLOT}"

if [ "$CURRENT" = "@as []" ]; then
  gsettings set "$BASE" custom-keybindings "['${SLOT}']"
else
  gsettings set "$BASE" custom-keybindings "$(echo "$CURRENT" | sed "s|]$|, '${SLOT}']|")"
fi

gsettings set "$SCHEMA" name    'talk-rs dictate'
gsettings set "$SCHEMA" command 'talk-rs dictate --toggle --viz waterfall'
gsettings set "$SCHEMA" binding '<Super>slash'

First press starts recording, second press stops, transcribes, and pastes into the focused application.

To change the key, replace <Super>slash with the desired binding (e.g. <Super>semicolon, <Super>d). Add --realtime to use streaming transcription instead of batch mode.

Development

cargo fmt                     # format
cargo clippy --all-targets    # lint
cargo test                    # test
cargo build                   # build

License

MIT – see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.package.d		.package.d
assets		assets
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.org		README.org
autogen.sh		autogen.sh
config.example.yaml		config.example.yaml
sample-metadata-realtime.yml		sample-metadata-realtime.yml
sample-metadata.yml		sample-metadata.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

talk-rs

Features

Prerequisites

Build dependencies

Runtime dependencies

Installation

Configuration

Required fields

Optional fields

Environment overrides

Usage

Global options

Dictate (main workflow)

Record

Transcribe

GNOME keyboard shortcut

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

talk-rs

Features

Prerequisites

Build dependencies

Runtime dependencies

Installation

Configuration

Required fields

Optional fields

Environment overrides

Usage

Global options

Dictate (main workflow)

Record

Transcribe

GNOME keyboard shortcut

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages