Writing
Articles
Thinking out loud about product management, AI agents, and building with LLMs.
Does the inference platform matter?
I deployed Whisper Large v3 on four inference platforms and got up to 67 percentage points of WER divergence on the same audio file. Same model, same input, different output.
Evaluating speech-to-text models for Indian banking
How do you evaluate a model that has no system prompt? I tested three ASR providers on code-mixed banking conversations and found that my measurement was more broken than the models.
What I learned building a speech-to-text app from scratch
Why do dictated words just 'appear' and why pay for Wispr Flow when open-source models exist? I built a local STT app to find out.
Revisiting the questions AI asked me: An ode to the AskUserQuestion tool
The QnA with Claude are the best part of my AI sessions. So I built a tool to resurface them.
Keeping context fresh for PM worklfows
How I leverage Claude Code with Claude in Chrome to keep PM context fresh and automated across recurring data workflows
Agent Teams for Product Managers
Can AI agents that argue with each other help a PM stress-test a product hypothesis? I tested Anthropic's Agent Teams feature to find out