The Meeting Recording Problem

The most important things said in meetings are never written down.

Not the agenda items. Not the action points. Not the things that end up in the follow-up email. The important things — the off-hand remark that reframes a problem, the half-formed idea that becomes the strategy, the commitment that someone made in passing and will later deny.

Those live in the air. And then they’re gone.

Unless someone recorded it.

The modern workplace generates an extraordinary amount of spoken content. Meetings, interviews, lectures, calls, voice memos, podcasts, presentations. Hours of audio every week. And almost none of it becomes searchable, quotable, referenceable text.

The bottleneck has always been transcription. Manual transcription is slow — a one-hour recording takes three to four hours to transcribe by hand. Professional transcription services are expensive — $1 to $3 per minute, so a one-hour meeting costs $60 to $180. Automated transcription has been available for years, but until recently the accuracy was poor enough that you’d spend as long correcting errors as you would have spent transcribing manually.

Modern speech-to-text models — particularly those based on transformer architectures — have changed this. Accuracy rates above 95% are now common for clear English audio. The technology recognises speaker patterns, handles punctuation, and adapts to accents far better than the generation of tools that preceded it.

But here’s the part that most people don’t consider: where does the audio go?

If you use a cloud transcription service, your recording is uploaded to a server. Your meeting — with its confidential discussions, its personnel issues, its strategic plans, its client names — is processed on someone else’s infrastructure. For a routine team standup, that’s probably fine. For a board meeting, a legal consultation, or a therapy session, it’s worth thinking about.

On-device transcription means the audio never leaves your machine. The speech recognition model runs locally, in your browser or on your desktop. The recording stays yours. The transcript stays yours. Nobody else processes it, stores it, or trains their models on it.

That’s not paranoia. That’s just how file processing should work.