- Menubar dictation app on macOS with start/stop hotkey control.
- ChatGPT login flow with browser OAuth as default and device-code fallback (if needed).
- Real-time pass-1 transcription from mic with committed/draft streaming assembly.
- Pass-2 finalize pass using
gpt-4o-transcribefor better punctuation and stability. - Optional Pass-3 rewrite for cleaner English output with numeric/proper noun protection.
- Auto-paste into the app that was frontmost when recording began.
- Configurable behavior and models via
config.toml.
For the normative product contract, constraints, and gaps, see System Spec v1.
V1 target is macOS-first and aligned to the English-only voice input design.
- Status: ✅ Core MVP loop is implemented (record → stream preview → finalize → optional rewrite → paste).
- Scope: ✅ Native macOS mic capture + OpenAI model pipeline only.
- Limitation: ✅ Linux/Windows build is intentionally disabled.
- Limitation:
⚠️ Known gaps vs full spec are documented in System Spec v1 (hotkey configurability, tray menu behavior, CPAL fallback robustness, and rollout cleanup items).
# Clone the repository.
git clone https://github.com/hack-ink/voxit
cd voxit
# To install Rust on macOS and Unix, run the following command.
#
# To install Rust on Windows, download and run the installer from `https://rustup.rs`.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --default-toolchain stable
# Install the necessary dependencies. (Unix only)
# Using Ubuntu as an example, this really depends on your distribution.
sudo apt-get update
sudo apt-get install <DEPENDENCIES>
# Build the voxit package, and the binary will be available at `target/release/voxit`.
cargo build --release -p voxit
# If you are a macOS user and want to have a `Voxit.app`, run the following command.
# Install `cargo-bundle` to pack the binary into an app.
cargo install cargo-bundle
# Pack the app, and the it will be available at `target/release/bundle/osx/Voxit.app`.
cargo bundle --release -p voxit- macOS
- Download the latest pre-built binary from GitHub Releases.
- Windows / Linux
- Not included in V1 release target (macOS-only).
Voxit stores settings in:
$HOME/Library/Application Support/voxit/config.toml
Current supported keys are:
[ui]
start_hidden = true
panel_width_px = 420
panel_height_px = 260
[hotkey]
chord = "ctrl+shift+space"
mode = "toggle" # toggle | hold
[audio]
backend = "voice_processing" # voice_processing | cpal
input_sample_rate_hz = 48000
input_device_name = ""
input_device_id = 0
realtime_target_rate_hz = 24000
[openai]
api_base_url = "https://api.openai.com/v1"
realtime_model = "gpt-4o-mini-transcribe"
finalize_model = "gpt-4o-transcribe"
rewrite_model = "gpt-5.2-mini"
language = "en"
[openai.realtime]
noise_reduction = "near_field" # near_field | far_field | off
[rewrite]
enabled = true
auto = true
guard_numbers = true
max_output_chars = 8000
style = "clean" # clean | formal | concise
[paste]
lock_frontmost_app = true
method = "clipboard_cmd_v"First-run onboarding checklist:
- Sign in with ChatGPT.
- Microphone permission in System Settings → Privacy & Security → Microphone.
- Accessibility permission in Privacy & Security → Accessibility (for Cmd+V fallback).
- Input Monitoring permission in Privacy & Security → Input Monitoring (for global hotkey hooks).
- Voxit uses request buttons to guide you through the permission prompts in sequence (Microphone → Accessibility → Input Monitoring); grant each permission and re-check when prompted.
- Verify paste flow after permission grant and restart the app if needed.
The app saves updates to the same config.toml path when settings are changed.
- Start recording: press the configured hotkey (default
Ctrl+Shift+Space) to toggle. - While listening: panel shows live draft text and committed segments.
- Stop recording: toggle key again or release key in hold mode.
- Finalize: Pass-2 runs automatically; rewrite runs by default unless disabled in settings.
- Microphone input selection is persisted in config as
audio.input_device_idandaudio.input_device_name. - Refresh workflow: the picker list is refreshed at startup and via the Refresh microphones control before choosing from a list of input-capable devices.
- Runtime fallback: if a saved explicit device id is unavailable, Voxit falls back to the system default input device and continues recording.
- Paste behavior: by default paste rewritten text after finalize, or paste raw transcript via available controls.
- Output target: text is pasted into the app that was frontmost when dictation started.
- Track versioned behavior changes in GitHub Releases.
eframe/eguipanel + menubar entrypoint.- Dedicated auth/session/config/rewrite/paste pipeline and typed application state.
- macOS frontmost-app capture + clipboard/command-paste integration.
If you find this project helpful and would like to support its development, you can buy me a coffee!
Your support is greatly appreciated and motivates me to keep improving this project.
- Fiat
- Crypto
- Bitcoin
bc1pedlrf67ss52md29qqkzr2avma6ghyrt4jx9ecp9457qsl75x247sqcp43c
- Ethereum
0x3e25247CfF03F99a7D83b28F207112234feE73a6
- Polkadot
156HGo9setPcU2qhFMVWLkcmtCEGySLwNqa3DaEiYSWtte4Y
- Bitcoin
Thank you for your support!
We would like to extend our heartfelt gratitude to the following projects and contributors:
- The Rust community for their continuous support and development of the Rust ecosystem.
- Not yet populated.
Licensed under GPL-3.0.