There are at least five areas that can kill a voice interaction and make the experience one that people will avoid for a long time before giving another chance. Since Amazon launched trigger partners for Alexa Voice Service and since a number of far field technologies are coming out soon (or are available), many hardware companies are looking to add voice. Areas they can consider before launching their products to avoid frustration are to investigate:
- Bad trigger. Nothing is as annoying as a device not waking up when you’re calling it. Almost equally annoying, but to a lesser extent, is a device false triggering. Triggers need to follow the Goldilocks’ principal and be at an equilibrium between too sensitive and not sensitive enough.
- Bad end of speech detection. After trigger, if end of speech detection never catches the end of a phrase, the devices becomes unresponsive. This is very annoying. There are a few algorithms for end of speech detection out there but improper noise rejection will make this annoying very quickly.
- Bad ASR results. Likely the result of reverb or other noise. This can result in the wrong intent or request.
- NLU errors. Despite everything else going right, if the system cannot correctly understand the user’s intent, it can be very frustrating.
- Latency. If the response time is over one second (or there aren’t other psychological tricks implemented to make the sure more patient) it might lead to a lot of frustration.
All of these areas need to be investigated before implemented voice on hardware, whether the voice service be Alexa Voice Service or others.