By 2020, there will be quite a few autonomous cars being marketed by major auto makers on the road. This number will quickly make it to double digit percentages by 2025 and it’s likely by 2030, it will seem almost irresponsible to get behind the wheel of a car and drive manually.

Perhaps also by that time, as many others have speculated, ride sharing will be so much more economical than car ownership that images like the above of us bumbling through a parking lot will seem vintage.

What will happen to voice in the car?

Amazon, Apple, and Google are all vying for integration with the major manufacturers. Companies like Logitech have also offered solutions, such as ZeroTouch, to upgrade older cars with voice. However, it seems like current voice interaction in the car is still too slow for effective and safe use.

Take an example of driving in a packed car and deciding that you want to get a coffee. On Siri, this search picks up what looks like a little red pin on a map — not useful at all while driving. Google takes 3–4 seconds to interpret STT but then comes up with an incorrect result. When it does finally work, it points to a place that was in the opposite direction of where you were heading.

Speech recognition in cars, especially when using the phone’s mics and not a newer car’s beam forming arrays, can be bad — especially in endpoint detection. The result is that the application needs to be really simple for voice to be tangible.

ZeroTouch does a good job of limiting inputs to Yes, No, Cancel, etc and asking for a prompt. The only issue is that endpoint because of background noise adds 3–5 seconds of latency to each turn. “Would you like me to read text message from John” — the decision to hear the SMS in TTS or having to repeat the command might be equally disruptive as showing the text in large font on the screen.

As cloud base ASR takes into account car noise and latency of the response decreases, the interaction in car will be as compelling as outside of it. Likewise, as phones become part of the autonomous driving system (for inputs), voice interaction and the applications will likely get much better. For example, you won’t need to judge whether a place on route has parking — this will be part of the search criteria.

