Behind the scenes of every voice interaction are dozens if not hundreds of processes that are working to ensure a smooth interaction. Probably more than any other medium, voice interactive technologies lie on the cutting edge of technology — perhaps augmented reality comes close.
For an Echo experience to be great, the device has to trigger, pick up voice, acknowledge the command was received, and respond with the correct answer in a second or so. This requires many distributed processes, a constant tunnel to a server, and connections with dozens of APIs in one shot.
Where it becomes difficult to match the performance is that if any of those processes goes wrong, the whole interaction can either stop working or be spotty. Worse, is that there are thousands of scenarios for interaction, so 99% of interactions can work but one interaction could either be problematic or catastrophic.
This is the challenge that hardware companies face when adding voice — how to make it reliable. Fortunately, as more work on local implementation of voice services is completed, bugs and issues are pushed more to the extreme use cases.