LipNet and Data Fusion Opportunity

Image for post
Image for post
The infamous Hal lipreading scene from 2001 Space Odyssey

Lip reading is not easy — even for humans. Our rate of success might be 52% and that’s only if we’re experience and good at it. If you’ve ever checked out the Bad Lip Reading Youtube channel, you can see what can happen when things go wrong…

However, researchers at the University of Oxford announced this month that they were able to get to 93% accuracy for their lipreading models from speakers with video. You can checkout the paper here. This is with just the silent video feed.

Here’s the video:

One of the big opportunities ahead is to couple this technology together with ASR. Together, they might push the current error rate records (which just surpassed human error rates) to much higher — where the device can understand us better than we can understand each other.

Imagine if the Google Home product was couple with a Nest Dropcam — it would then have all the inputs to do this.

Written by

Independent daily thoughts on all things future, voice technologies and AI. More at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store