Where Voice Interfaces Meet Resistance

Leor Grebler
3 min readJan 16, 2017

When I had worked for IBM selling WebSphere, one of the products has the ability to take green screen interfaces and turn them into web forms. Ten years ago, this was a big step forward for a company making an update. Now, staff would have to login to a prompt and could walk through steps. The backend would remain the same, but the interface changed.

It would seem like a no brainer except that often, the new interface would slow down people who were used to navigation the menus. The latency on these screens was in milliseconds and there were many rules that would prevent erroneous or null entrees.

With these green screen fixes, the problem wasn’t in the interface — it was that the operators were the ones running different routines to perform functions.

Today was a beautiful example of how this is still the case. This morning, in Toronto, from the time I arrived at the airport by car to the time I was through customs, security, and on the plane was less than 15 minutes. Three of those minutes was being transferred to an earlier flight.

As I’d finished going through US Customs, I heard this: “final boarding call for Air Canada 704 for La Guardia… Gate F64”. Wonderful — didn’t even have to look at a screen to find the gate for the next flight.

I asked the gate agent if I could transfer. No issue in terms of space. However, she had to go through a number of steps.

  • Look up my PNR (Passenger Name Record)
  • Offload me from the flight I was on
  • Cancel my early standby on a different flight
  • Add me to standby list
  • Transfer me from standby list to flight list
  • Board me onto the flight

Each of these steps involved looking me up or tabbing through windows. The gate agent was proficient in the menus and was able to navigate them very quickly — maybe 20–30 s per operation.

It might have been great for her to say to the computer “Board this passenger” and have the rest of the work done. The easier part in this instance is the far field speech capture, speech to text, and even NLU. The harder part is creating the API that can take the request and convert it to action. The even harder part is defining all of the logic that needs to be tested to allow the API to do many different tasks.

This is where the interface challenge hits the wall. There are countless examples of this one where it’s easier to leave the system in place than contend with the infinite possibilities and rules that might result from automating it. The next leap in AI will be in looking over the shoulders of workers and understanding how the logic works so that the manual process to create APIs isn’t needed.



Leor Grebler

Independent daily thoughts on all things future, voice technologies and AI. More at http://linkedin.com/in/grebler