I talk about Alexa, Google, and Ubi — a lot. Literally — in that I’m on the phone a lot talking about these devices. In fact, part of my desk is voice device museum. The result is that I get a lot of false triggers, especially during phone calls.
Occasionally, the devices will overlap in their response and get into a loop. This is entertaining and many have caught this in various YouTube videos.
The issue with these devices is that they are reaction to a spoken word for their trigger but are not assessing other criteria. Two additional criteria that can be used to avoid this type of situation are user presence and liveness.
With user presence, the device needs to also detect motion, light, or an object within proximity in order to react. It can also come from a cellphone appearing on the same network as the device or maybe through a companion app giving a geofence.
Liveness is another way. It can determine that 1) it’s a human speaking (not a text-to-speech engine) and 2) the human is present and not a recording.
What we’ll likely see developed over the next year are more layers of complexity in determining trigger. What that could mean is that there is a short window to produce videos like the ones above.