Handheld Devices and Ubiquitous Voice Interaction

While there have been huge strides in getting voice interaction to be reliable on Android and Apple devices and user adoption of voice interaction has increased many times over for the past few years, there are still barriers that will prevent iPhones and Android phones from extending the reach of ubiquitous voice services and ambient voice interaction.

  1. Problems with Connectivity

Right now, when the Internet is off, so is voice recognition. This means that the possibility of doing any voice control through the device is low. It’s not just when the Internet is completely off, but whenever there is higher than usually latency. Having some type of backup for offline interaction would prevent the sting that users feel when pressing the microphone button is met with an error.

2. Unreliable handsfree trigger

At least on Android, every time there is an OS update, the OK Google trigger needs to be re-trained and it will sometimes switch between multiple Google accounts on my Samsung device, so I need to manually reset this. Also, handsfree trigger won’t work from no screen off mode unless the device is powered. Boo. This adds too many criteria to voice interaction for the user to remember and use reliably.

3. Low availability of applications

Android, while improving, is by no means a high availability OS. Having multiple apps in the background affects the speed at which voice trigger or even manually selected voice interaction start functioning. This can cause failures of the voice apps.

4. Poor end-of-speech detection

I’ve seen a few implementations that will fail, especially when set up in a car, as a result of poor end-of-speech detection. They’ll just keep recording until some preset cutoff (or run indefinitely). This becomes even more problematic when an external Bluetooth microphone is used.

5. Variability of interaction

When some voice apps work one way and others, another it can become very confusing to the end user. They have to remember how the app they’re requesting is supposed to function. If there are too many different rote ways of interacting, users will choose to just use what they’re more certain with and not engage in voice interaction.

The result of all of these challenges is that handsfree voice interaction on a phone might work 5 out of 10 times. In order to be fully adopted, it needs to work 9/10 times or more. Maybe the next generations of phones (or 2–3 generations from now) will be fully hash out the interaction issues.

