Skip to content

Alusage/talk-rs

 
 

Repository files navigation

talk-rs

Voice dictation for Linux. Record, transcribe, and paste text into any application – all from a single keyboard shortcut.

talk-rs captures audio from your microphone, sends it to a transcription API (Mistral Voxtral or OpenAI Whisper), and types the result into the focused window. A small X11 overlay badge shows the current state (recording / transcribing) so you always know what is happening.

Features

  • Dictation workflow – press a key to start recording, press again to stop; transcribed text is pasted automatically.
  • Multiple providers – Mistral (Voxtral) and OpenAI (Whisper / GPT-4o) for batch transcription; Mistral and OpenAI realtime streaming via WebSocket.
  • Speaker diarization – identify who is speaking (--diarize); output is tagged with speaker labels. Currently supported with Mistral V2 models in batch mode.
  • Multi-candidate picker – with --pick, run several providers in parallel and choose the best transcription from a GTK picker window. A waterfall spectrogram of the recording is shown above the candidate list; it loads asynchronously and adapts to window width.
  • Visual overlay – non-intrusive X11 badge at the top of the screen (works without a compositor).
  • Dead audio detection – overlay badge shows a red prohibit icon and “NO SOUND” warning when no real microphone is detected (e.g. headset unplugged); a notification also appears on the text panel.
  • Auto-pause – automatically pauses audio forwarding during silence, trimming dead air from transcription input. Resumes instantly with a 300 ms lookback buffer to preserve speech onset. The badge shows yellow pause bars and “LISTENING” during pauses. Disable with --no-auto-pause.
  • Audio visualizers – optional in-badge visualization during recording (--viz waterfall, --viz amplitude, --viz spectrum); monochrome mode with --mono.
  • Audio feedback – start/stop tones and periodic boop during recording; fully configurable or disabled.
  • Context bias – supply domain-specific vocabulary to improve transcription accuracy.
  • Daemon toggle mode – first invocation starts recording, second stops and transcribes; ideal for global shortcuts.
  • Retry last – re-transcribe the last cached recording without speaking again (--retry-last).
  • Recordings browserrecord --ui opens a GTK4 window listing all recordings (OGG from output_dir) and dictation cache (WAV). Play, delete, or open in the file manager; sections auto-refresh via inotify when files change externally.
  • Standalone commandsrecord and transcribe can be used independently for scripting.
  • Environment overrides – every config value can be set via TALK_RS_* environment variables.

Prerequisites

Build dependencies

# Debian / Ubuntu
sudo apt install build-essential pkg-config libasound2-dev libopus-dev \
  libgtk-4-dev libpipewire-0.3-dev libspa-0.2-dev libclang-dev

# Fedora
sudo dnf install alsa-lib-devel opus-devel pkg-config \
  gtk4-devel pipewire-devel spa-devel clang-devel

A working Rust toolchain is required (1.70+). Install via rustup if needed.

Runtime dependencies

PipeWire must be running (used for audio capture). Most modern Linux desktops ship with PipeWire by default.

An API key for at least one transcription provider is required:

Installation

If building from a git clone, run ./autogen.sh first to resolve version placeholders in Cargo.toml:

./autogen.sh
cargo build --release

The binary is at target/release/talk-rs. Copy it somewhere in your $PATH:

cp target/release/talk-rs ~/.local/bin/

Configuration

talk-rs reads $XDG_CONFIG_HOME/talk-rs/config.yaml (typically ~~/.config/talk-rs/config.yaml~).

Copy the example and fill in your values:

mkdir -p ~/.config/talk-rs
cp config.example.yaml ~/.config/talk-rs/config.yaml

Minimal working configuration:

output_dir: ~/talk-rs-output

providers:
  mistral:
    api_key: YOUR_MISTRAL_API_KEY

Required fields

FieldDescription
output_dirAbsolute path to a writable directory for recordings
providers.mistral.api_keyMistral API key (if using Mistral)
providers.openai.api_keyOpenAI API key (if using OpenAI)

Optional fields

FieldDefaultDescription
providers.mistral.urlhttps://api.mistral.aiMistral API base URL
providers.mistral.modelvoxtral-mini-2507Mistral transcription model
providers.mistral.context_biasnoneComma-separated words for accuracy
providers.openai.urlhttps://api.openai.comOpenAI API base URL
providers.openai.modelwhisper-1OpenAI batch model
providers.openai.realtime_modelgpt-4o-mini-transcribeOpenAI realtime model
transcription.default_providermistralDefault provider when unspecified
indicators.boop_interval_ms5000Periodic boop interval in ms (0 disables boops; also --no-boop)
indicators.visual_overlaytrueShow X11 overlay badge
indicators.viznoneIn-badge visualizer: waterfall, amplitude, or spectrum (also --viz; env TALK_RS_INDICATORS_VIZ)
indicators.monofalseMonochrome visualizer (also --mono)
paste.chunk_chars150Max chars per paste chunk (0 disables chunking; also --no-chunk-paste)

Environment overrides

Every config value can be overridden via environment variables:

export TALK_RS_PROVIDERS_MISTRAL_API_KEY="sk-..."
export TALK_RS_PROVIDERS_OPENAI_API_KEY="sk-..."

See config.example.yaml for the full list.

Usage

Global options

FlagEffect
-vIncrease logging verbosity (-vv debug, -vvv trace)

Dictate (main workflow)

Record, transcribe, and paste into the focused application:

talk-rs dictate

Toggle mode (ideal for keyboard shortcuts):

talk-rs dictate --toggle

First call starts a background daemon that records. Second call stops recording, transcribes, and pastes the result.

Options:

FlagEffect
--toggleDaemon toggle mode
--providerChoose mistral or openai
--modelOverride model for this invocation
--diarizeEnable speaker diarization (batch mode only)
--realtimeStream audio via WebSocket (incremental text)
--pickShow multi-candidate picker (GTK window)
--retry-lastRe-transcribe the last cached recording
--replace-last-pasteDelete previous paste before inserting new text
--save <PATH>Save audio recording to a file
--output-yaml <FILE>Write transcription metadata YAML
--input-audio-file <FILE>Feed a pre-recorded WAV instead of live mic
--monitorMix system audio (monitor) with mic input
--no-soundsDisable audio indicators
--no-boopDisable periodic boop sounds (keep start/stop)
--no-chunk-pastePaste all text in one shot (disable chunking)
--no-overlayDisable visual overlay
--no-auto-pauseDisable auto-pause during silence (forward all audio)
--viz <MODE>In-badge visualizer: waterfall, amplitude, or spectrum
--monoMonochrome visualizer (theme-aware)

Record

Capture audio to an OGG/Opus file:

talk-rs record                       # auto-named memo-YYYY-MM-DD-HH-MM-SS.ogg
talk-rs record meeting-notes.ogg     # custom filename

Options:

FlagEffect
--monitorMix system audio (monitor) with microphone input
--uiOpen GTK4 recordings browser (play, delete, open folder)

Transcribe

Transcribe an existing audio file:

talk-rs transcribe recording.ogg                # print to stdout
talk-rs transcribe recording.ogg output.txt     # write to file
talk-rs transcribe recording.ogg --provider openai

Options:

FlagEffect
--providerChoose mistral or openai
--modelOverride model for this invocation
--diarizeEnable speaker diarization (tag by speaker)

GNOME keyboard shortcut

Bind talk-rs dictate --toggle to a key (e.g. Super+/):

BASE="org.gnome.settings-daemon.plugins.media-keys"
BPATH="/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings"

CURRENT=$(gsettings get "$BASE" custom-keybindings)
N=0
while echo "$CURRENT" | grep -q "custom${N}/"; do
  N=$((N + 1))
done
SLOT="${BPATH}/custom${N}/"
SCHEMA="${BASE}.custom-keybinding:${SLOT}"

if [ "$CURRENT" = "@as []" ]; then
  gsettings set "$BASE" custom-keybindings "['${SLOT}']"
else
  gsettings set "$BASE" custom-keybindings "$(echo "$CURRENT" | sed "s|]$|, '${SLOT}']|")"
fi

gsettings set "$SCHEMA" name    'talk-rs dictate'
gsettings set "$SCHEMA" command 'talk-rs dictate --toggle --viz waterfall'
gsettings set "$SCHEMA" binding '<Super>slash'

First press starts recording, second press stops, transcribes, and pastes into the focused application.

To change the key, replace <Super>slash with the desired binding (e.g. <Super>semicolon, <Super>d). Add --realtime to use streaming transcription instead of batch mode.

Development

cargo fmt                     # format
cargo clippy --all-targets    # lint
cargo test                    # test
cargo build                   # build

License

MIT – see LICENSE.

About

Voice dictation CLI for Linux (Rust, GTK4, PipeWire, Whisper)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Rust 99.0%
  • Shell 1.0%