There’s a delicate dance when it comes to adding audio acknowledgements to voice interaction, especially to confirm trigger word. One of the issues that we have encountered in the past and continue to have to keep an eye out for is that the trigger acknowledgement may be captured by the microphone that’s supposed to pick up voice interaction.
The biggest issue that this can cause is messing up with endpoint detection. If endpoint detection is determined based on first detecting silence or the ambient volume, then listening for speech signal, then stopping when the volume returns to the ambient level. When a loud noise is played at the beginning of this period, the ambient volume might be detected as the wrong level and cause an early endpoint detection.
The symptom of this would be hearing an endpoint detection acknowledgement right after the trigger acknowledgement. It’d be like two back to back beeps.
Another issue that can come up is affecting the DSP performance, especially if it’s a host based DSP. The beep might cause the DSP algorithm to dampen noise to a point where speech can’t be detected.
One resolution around this is to put in place a watchdog to monitor the processes and ensure that microphone has started to record until after trigger acknowledgement has stopped.