Looking ahead the next 5–10 years, I could imagine a new type of DSP for far field interaction.
It will reside locally on device, running on a non-dedicated hardware, and run with very little processing power.
It will be built using deep neural networks that teach hearing but only when it recognizes a particular person’s voice.
It won’t even hear or pay attention to any other voice and will be completely deaf to background noise.
It will even be able to hear you whisper in a loud room.
The result will be a completely secure voice interactive device / service that will be able to determine spoofing by TTS engines or by recordings. Even a single mic will be usable for far field interaction at that point.