It’ll be soon enough (within two years) that the accuracy in English of handheld and far field devices will be parity with the best human transcription. We’ll soon expect speech to text to perform perfectly and for natural language understanding to catch our innuendos. So the question is how will we adapt to this new perfection?
It seems like we easily adapt to the limit of technology. We slow down to make sure the Echo catches us, or make sure it shows we’ve trigger it before speaking. We also might pronounce certain works differently based on experience with more accuracy.
However, what if we were to expect perfection? How would it change the speed with which we issue commands, how we ask them, and waiting for confirmation?
In all likelihood, it would make voice interaction seem less intrusive, especially when speaking in a group. It would also reduce the cognitive load in deciding whether to make a request via voice. The result, more whispering, more voice requests, faster results.