March 1, 2026

/

1 min read

Vox

View on GitHub

The problem

Free ASR models like Whisper achieve high accuracy, but raw transcriptions capture every filler word and speech artifact. The gap between "accurate transcription" and "good dictation" is where product value lives. I wanted to understand that gap by building the free version and noticing what's missing.

What it does

A native macOS menu bar app: Option+Space to record, WhisperKit (Whisper on Apple Neural Engine) for local inference, regex-based filler word removal, and clipboard-based text insertion into any app. Runs on Apple Silicon with no cloud, no subscription, and no data leaving your machine.

Includes an interactive visual explainer of the full ASR pipeline: audio capture through spectrogram inference to text insertion, in five animated stages.