Speech Recognition Reaches Human Parity

Image for post
Image for post

Paraphrasing Kevin Kelly (again), predicting the future mostly has to do with understanding the present. Back in May at Speechtek, I talked about how within the next year, speech recognition was going to reach human performance levels. Apparently, that’s the case today. Two days ago, Microsoft announced its latest results with Cortana — an error rate of 5.9%. This matches human transcription error rates.

The significance of this is big — not because it puts manual transcription out of business (it won’t — not yet), but because it means it’s another fete of machines encroaching on human cognitive abilities. Vision APIs can now scan photos for Coca Cola logos or cat faces much faster than we can.

For it to no longer make sense to have any human proofreading, the error rate might need to get to 1%. The same reported that for last month, the rate was 6.3%. Since ASR / error rate performance gets exponential harder *and* since the technology also increases exponentially, we might be able to assume a linear decline in error at the same rate — 0.4% per month. Maybe it’ll be a year before we get to 1%? Maybe two years? Let’s be conservative and say five years. At that point, the equivalent performance of today will spread to other languages, as will improvements in far field modeling of voice. At that point we’re looking at the end of stenography.

Independent daily thoughts on all things future, voice technologies and AI. More at http://linkedin.com/in/grebler

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store