There seem to be two expectations when it comes to acoustic echo cancellation (in short, the ability to talk over a device and barge in): there’s the expectation of the designer and then there are those of the users.
For AEC, it’s a strange reversal — it seems it’s the designer who has higher expectations for performance that the user.
Why is this?
When we design for AEC, we expect the reference signal to be completely silenced, as though it didn’t exist. This can usually never happen as there’s reverb and other signals that make there way inevitably into the mic channel.
For the users, they intuitively know to raise their voice the volume coming from a device is louder and don’t easily form the expectation that the device will hear them when music is blaring.
For AEC, one of the ways to mitigate issues is to increase sensitivity of the trigger word and then duck audio as soon as a trigger is detected. Trigger sensitivity can also be dynamically set based on the false acceptance rate or by asking for confirmation from the user. Also, it’s possible that after a certain volume, the device can warn that it needs to be manually triggered to be interrupted.