When it comes to local triggers, there is a lot of demand to enable these in hardware products. Sensory, Kitt.AI, Malaspina, and Rubidium are examples of trigger software providers who can also offer products under the name of local ASR, phrase spotting, or wake up word. Interesting things can happen when you run these tools in parallel as well as in serial.
A quick review… all of these tools offer two issues: 1) false acceptance and false rejection. Both are undesirable and ruin the fun of voice interaction.
Let’s start with serial… here, you can have a primary wake up word followed by several commands. In this instance, the probability of a false accept or reject is the same for the single wake word. However, because you know that after the primary wake word, the user is going to ask one of the secondary commands, you can either lower the threshold for acceptance or use a trigger word that hasn’t been as robustly tuned. The primary wake word acts as a gatekeeper. You can also limit the time that a user has to speak one of the secondary commands.
Running triggers in parallel presents the option of having different onramps for different services. For example, you can use “Alexa” as a trigger word for accessing Alexa Voice Service and then have “Hello Device” or some other trigger to access a different voice service (maybe a custom one).
When running in parallel, the false accept rates likely have a summative quality meaning that the more parallel triggers you run, the more likely you are to trigger the device. The false reject rate is likely to remain the same being hat the user only speaks one phrase at a time.
With parallel, there is also the idea of running two or more different trigger engines, which could be required to agree with each other before allowing the trigger to be accepted. The potential issue here is that the two engines would need to share the same resource and that could lead to problems (e.g. running both Kitt.AI and Sensory at the same time and then having a state machine run between them).
The other parallel approach with a voting system is hybrid triggers. Here, one trigger runs on a DSP or other low power chip and another runs on the application processor. Both would need to agree to accept the trigger.
As more devices become voice enabled, the need to implement different voice triggering scenarios increases. With security a major concern, even more options involving biometrics or authentication will become necessary.