Reehan Ahmed

Building products at the intersection of algorithms and AI

Latest Writing

Articles

Mar 22, 2026/5 min read

Does the inference platform matter?

I deployed Whisper Large v3 on four inference platforms and got up to 67 percentage points of WER divergence on the same audio file. Same model, same input, different output.

Speech-to-TextInference PlatformsEvaluation

Mar 15, 2026/6 min read

Evaluating speech-to-text models for Indian banking

How do you evaluate a model that has no system prompt? I tested three ASR providers on code-mixed banking conversations and found that my measurement was more broken than the models.

Speech-to-TextASREvaluation

Mar 1, 2026/3 min read

What I learned building a speech-to-text app from scratch

Why do dictated words just 'appear' and why pay for Wispr Flow when open-source models exist? I built a local STT app to find out.

Speech-to-TextAI ProductsmacOS

Feb 22, 2026/2 min read

Revisiting the questions AI asked me: An ode to the AskUserQuestion tool

The QnA with Claude are the best part of my AI sessions. So I built a tool to resurface them.

Claude CodeAI ToolsReflection

Feb 15, 2026/4 min read

Keeping context fresh for PM worklfows

How I leverage Claude Code with Claude in Chrome to keep PM context fresh and automated across recurring data workflows

Claude CodeClaude in ChromeCustom Skills

Feb 9, 2025/4 min read

Agent Teams for Product Managers

Can AI agents that argue with each other help a PM stress-test a product hypothesis? I tested Anthropic's Agent Teams feature to find out

Claude CodeAgent Teams

View all articles

Side Projects

Projects

View all

Mar 22, 2026/How do you evaluate a model that has no system prompt? I built an evaluation pipeline to find out.

ASR Evaluation Exploration

An evaluation framework for speech-to-text models and inference platforms, tested on code-mixed Indian banking audio across 7 providers and 4 deployment platforms.

Mar 1, 2026/Open-source ASR models hit 95%+ accuracy. So why does paid dictation software feel dramatically better?

Vox

A native macOS speech-to-text menu bar app, built to understand what makes great dictation software great.

Feb 22, 2026/Your AI conversations have a reasoning trace buried in them. What if you could go back and read it?

Claude QA Viewer

A zero-dependency tool that extracts AskUserQuestion interactions from Claude Code sessions and generates an interactive HTML visualization.

Oct 1, 2025/Manual ticket triage doesn't scale. What if an LLM could read every conversation thread for you?

Support signal

A Python tool that automates Zendesk ticket analysis using LLMs, turning weeks of manual triage into a 2-hour automated run.

View all projects

Featured Work

Case Studies

View all

03/Self-ServiceSupport

Diagnostics - Helping customers ship with confidence

Building a self-service troubleshooting tool that reduced L1 support tickets by 35% and serves 700+ customers.

35%

Ticket reduction

04/EvaluationLLM-as-Judge

Prototype to Production: Evals for AI reliability

From prompt to rule: building a 4-dimension LLM-as-judge framework that improved accuracy from 45% to 85%.

85%

Accuracy

View all case studies