One of the coolest movies I watched as a kid was Sneakers. It was about a gangly bunch of hackers who wanted to prevent a code decipher from getting into the wrong hands. This required sneaking into a super secure facility involving surveillance cameras, motion sensors, and voice authentication mantraps.
One of the scenes looks like this:
Robert Redford’s character gets past the authentication tool by playing a recording of Stephen Tobolowsky’s character into a mic. Getting the recording involved setting the target on a date and getting him to recite the authentication lines.
Today, tools like Lyrebird could have been used to patch together the user’s speech. However, even if this is the case, the use of voice for authentication is week because it can be played through a recording. The best way to overcome this is by adding liveness testing:
- Getting the person to repeat a random line
- Checking to see that the voice sample does not have any artifacts of a recording (taking a higher sample rate than most recording devices)
- Checking that prosody matches the phrase, so that the words aren’t been stitched together
Even so, it’s better to have multi-factor authentication around voice where the voice part is the lighter part. Or combine with facial recognition, facial movement recognition, or other unique combination.
“My voice is my passport, please verify me” is not enough.