Lip reading is not easy — even for humans. Our rate of success might be 52% and that’s only if we’re experience and good at it. If you’ve ever checked out the Bad Lip Reading Youtube channel, you can see what can happen when things go wrong…
However, researchers at the University of Oxford announced this month that they were able to get to 93% accuracy for their lipreading models from speakers with video. You can checkout the paper here. This is with just the silent video feed.
Here’s the video:
One of the big opportunities ahead is to couple this technology together with ASR. Together, they might push the current error rate records (which just surpassed human error rates) to much higher — where the device can understand us better than we can understand each other.
Imagine if the Google Home product was couple with a Nest Dropcam — it would then have all the inputs to do this.