Fickleness of Voice Interaction

Image for post
Image for post

Behind the scenes of every voice interaction are dozens if not hundreds of processes that are working to ensure a smooth interaction. Probably more than any other medium, voice interactive technologies lie on the cutting edge of technology — perhaps augmented reality comes close.

For an Echo experience to be great, the device has to trigger, pick up voice, acknowledge the command was received, and respond with the correct answer in a second or so. This requires many distributed processes, a constant tunnel to a server, and connections with dozens of APIs in one shot.

Where it becomes difficult to match the performance is that if any of those processes goes wrong, the whole interaction can either stop working or be spotty. Worse, is that there are thousands of scenarios for interaction, so 99% of interactions can work but one interaction could either be problematic or catastrophic.

This is the challenge that hardware companies face when adding voice — how to make it reliable. Fortunately, as more work on local implementation of voice services is completed, bugs and issues are pushed more to the extreme use cases.

Independent daily thoughts on all things future, voice technologies and AI. More at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store