Image for post
Image for post

It’ll be soon enough (within two years) that the accuracy in English of handheld and far field devices will be parity with the best human transcription. We’ll soon expect speech to text to perform perfectly and for natural language understanding to catch our innuendos. So the question is how will we adapt to this new perfection?

It seems like we easily adapt to the limit of technology. We slow down to make sure the Echo catches us, or make sure it shows we’ve trigger it before speaking. We also might pronounce certain works differently based on experience with more accuracy.

However, what if we were to expect perfection? How would it change the speed with which we issue commands, how we ask them, and waiting for confirmation?

In all likelihood, it would make voice interaction seem less intrusive, especially when speaking in a group. It would also reduce the cognitive load in deciding whether to make a request via voice. The result, more whispering, more voice requests, faster results.

Independent daily thoughts on all things future, voice technologies and AI. More at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store