There is low hanging fruit for retail for voice, and that’s drive thru food ordering. Drive Thru food ordering represents billions of dollars of commerce and reducing turnaround time and increasing accuracy can result in millions in either savings or additional order capture.
How could speech recognition work?
Initially, it would look just like it does today, with recordings captured and then indexed together with the actual orders. This would be used to train a service like Kaldi to create a custom ASR for food ordering.
The next step would be to test, compare, and improve the ASR.
Eventually, the operator would guide the customer through the order and in real time, the order would be captured by ASR and the operator would verify.
Once the ASR gets within 1–2% error rate of an operator, the system could switch entirely to automated ordering, with the order corrected by the operator when the customer notices an error.
The real issue with the drive thru will be providing enough time and a limited selection so it doesn’t stifle customers and lead them to order regret. That’s a whole different area of psychology (rushed food ordering regret is illustrated well here … earmuffs required for the children).
The takeaway is that the first step in implementing voice in retail applications is to capture recordings and setup testing.