We’re used to using Alexa as the wake word for the Echo and other Alexa devices. But would it be possible to use Alexa at the at end of the sentence? “Play music, Alexa?”

The short answer is yes. Amazon already requires half a second of caching of audio before the wake word is heard. Once the wake word is detected, this cached audio along with the recording of the wake word is sent to Alexa Voice Service for cloud verification. If it’s determined that the wake word wasn’t spoke, Alexa will shut down the stream. This caching means that the time between Alexa being spoken and the rest of the command can be very small.

If we increase the caching, we can potentially capture the command and then say the wake word. We then dump the entire cache and await the response. The user experience might be a little bit worse for a few reasons:

  • The command length might be limited by the maximum recording length

However, the biggest issue to users may be a delay. With normal usage, the audio is already being streamed. With the wake word after scenario, the issue is that the audio is only sent after wake word. “What’s the time, Alexa”. Wake word needs be detected, then audio cache is dumped. Then the user waits for the response. Tick tock tick tock… My guess is that this is perceivable by the user.

