Native macOS subtitle generation tool - powered by Qwen3-ASR
Last updated: 2026-05-04
Automatically generate subtitles from video/audio files, fully local. Uses Qwen3-ASR (Chinese recognition quality superior to Whisper), targeting Chinese YouTubers, podcast creators, and meeting note-takers.
- Zero Python dependencies - All native Swift implementation
- Apple Silicon first - MLX inference, Metal GPU acceleration
- Singleton model - Global ASR model loaded only once
- Streaming UI - Recognition results pushed line by line
- Subtitle burning - Hard-code subtitles to video (supports H.264/H.265/AV1/VP9/ProRes/VideoToolbox)
- Multi-format export - Supports SRT, VTT, TXT, ASS, SSA, SBV, CSV, LRC, TTML export
- Internationalization - Full Chinese and English UI support
| Component | Technology | Version |
|---|---|---|
| Language | Swift | 6.0+ |
| UI Framework | SwiftUI + AppKit | macOS 15.0+ |
| Concurrency Model | Swift Structured Concurrency | async/await + Actor |
| Package Manager | Swift Package Manager | - |
| Module | Description |
|---|---|
| Qwen3ASR | ASR inference (Qwen3-ASR-0.6B / 1.7B) |
| SpeechVAD | Voice activity detection (Silero VAD) |
| SpeechUI | SwiftUI audio components |
| AudioCommon | Shared audio protocols and utilities |
| FFmpeg | Subtitle burning (bundled binary) |
Uses system frameworks, no third-party dependencies:
AVFoundation- Audio extraction, format conversion, video playbackAVAudioConverter- Resampling (any sample rate → 16kHz mono)
SubScript/
├── project.yml # XcodeGen configuration
├── Package.swift # SPM configuration
├── Makefile # Build script
│
├── Sources/SubScript/
│ ├── SubScriptApp.swift # @main entry point
│ ├── ContentView.swift # Page routing
│ │
│ ├── Models/
│ │ ├── SubtitleSegment.swift # Subtitle data model
│ │ └── TranscriptionJob.swift # Task status model
│ │
│ ├── Pipeline/ # Core business logic
│ │ ├── ModelManager.swift # Singleton model manager
│ │ ├── AudioExtractor.swift # Audio extraction actor
│ │ ├── TranscriptionPipeline.swift # Transcription pipeline
│ │ └── SentenceSplitter.swift # Sentence splitting + timestamp alignment
│ │
│ ├── State/
│ │ └── AppState.swift # @Observable global state
│ │
│ ├── Services/
│ │ ├── ExportService.swift # SRT/VTT/TXT/ASS export
│ │ └── BurnService.swift # Subtitle burning
│ │
│ ├── Store/
│ │ └── BurnConfig.swift # Burn config model
│ │
│ └── Views/
│ ├── HomeView.swift # Home: drag & drop area
│ ├── ProcessingView.swift # Progress page
│ ├── ResultsView.swift # Results + subtitle editor + burn button
│ ├── ExportView.swift # Export panel
│ ├── BurnOptionsView.swift # Burn options
│ └── BurnProgressView.swift # Burn progress
│
├── Sources/SubScript/Resources/
│ └── Binaries/
│ └── ffmpeg # Bundled ffmpeg (with libass)
│
└── progress.md # Development progress tracking
- macOS 15.0+ (Sonoma)
- Xcode 16+ (with Metal Toolchain)
- Apple Silicon (M1/M2/M3/M4)
make bundleThis will automatically:
- Install Metal Toolchain (~688MB)
- Resolve SPM dependencies
- Compile MLX Metal shader library
- Compile SubScript
# Open release version
open .build/release/SubScript.appProject managed with XcodeGen, root directory has project.yml.
- After adding any
.swiftfile, must runxcodegen generateto regenerate project - Must not manually edit any files in
.xcodeproj - New SPM dependencies should be added in
project.ymlunderpackagesanddependencies - Directory structure changes should be synced to
project.ymlsourcesconfig
- Launch App → Open HomeView (drag & drop area)
- Drag in video → MP4/MOV/MP3/M4A/WAV/AAC
- Wait for processing → ProcessingView shows progress
- View results → ResultsView shows video + subtitles
- Burn subtitles (optional) → Click burn button, hard-code subtitles to video
- Export → ExportView select format
| Input | Supported Operations |
|---|---|
| MP4 | Recognition, burning, export |
| MOV | Recognition, burning, export |
| M4V | Recognition, burning, export |
| MP3 | Recognition, export |
| M4A | Recognition, export |
| WAV | Recognition, export |
| AAC | Recognition, export |
Note: Burning only works for video files (MP4/MOV/M4V), audio files do not support burning.
- SRT - Standard subtitles
- VTT - Web subtitles
- TXT - Plain text
- ASS/SSA - Advanced styling
- SBV - SubViewer format
- CSV - Spreadsheet format
- LRC - Lyrics format
- TTML - Timed Text format
| Option | Description |
|---|---|
| Encoder | H.264, H.265/HEVC, AV1, VP9, ProRes, VideoToolbox |
| Quality | High quality (CRF 18), Balanced (CRF 23), Small file (CRF 28), Custom |
| Font size | 12-72pt |
| Subtitle color | Optional (ColorPicker) |
| Stroke color | Optional (ColorPicker) |
| Stroke width | 0-5px |
| Shadow | Enable/Disable |
| Position | Bottom/Top/Center |
| Output format | MP4/MOV/MKV/WebM |
- Slow first run - Requires model download (~1-2GB), cached in
~/Library/Caches/qwen3-speech/ - Assets.xcassets - Needs migration from original location to SPM standard location
| Phase | Status |
|---|---|
| Phase 1: Core pipeline verification | ✅ Complete |
| Phase 2: Basic UI | ✅ Complete |
| Phase 3: Editor improvements | ✅ Complete |
| Phase 3.1: Editor Bugfix | ✅ Complete |
| Phase 4: Pro features | ✅ Complete (subtitle burning) |
See progress.md for details.
This app is built on the following open source projects, following their respective licenses:
| Project | Copyright | License |
|---|---|---|
| Qwen3-ASR | © Alibaba Cloud | Apache License 2.0 |
| speech-swift | © soniqo | MIT License |
| Apple MLX | © Apple Inc. | MIT License |
| Silero VAD | © Silero Team | MIT License |
| FFmpeg | © FFmpeg Team | LGPL/GPL |
Apache License 2.0 - See LICENSE file for details