Image for post
Image for post

Speech recognition is tough. What’s weird is that good enough is usually not good enough. Even at 90% plus accuracy, reading a speech transcription can be awful. Little things like where a comma is placed can have a big impact on meaning.

Worse, is that when these results are used as the basis for other analysis such as personality or emotion, it can compound errors.

There are many things that can result in poor transcription.

Audio quality is the biggest factor. Noise, echo, delay, compression, etc. can all cause worse performance.

Training can only go so far if the quality is low, however, many technologies can be employed to improve the quality. Training speech recognition on poor audio samples and having feedback loops in place for continuous improvement are the factors that can turn initially poor speech recognition into something amazing in a short period of time.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store