TechCrunch reported a few days ago that Amazon will be rolling out new versions of its Alexa voice for reading news or Wikipedia articles. The voice does seem to be an improvement over Alexa’s more cheerful voice and likely more tolerable for longer speech listening.
It’s nice to see progress towards the eventual form of TTS that will indistinguishable from the human voice. We’re due for some major breakthroughs in TTS, especially after Wavenet and Tacotron2 (it’s been over a year since the last major announcement on this front). Google Duplex seemed to have some version of TTS but it could have been pre-recorded audio for the hums and haws.
We’re probably going to see an interim step in the move towards undetectable TTS. The first will be “announcer-like” recordings vs Alexa / Google Assistant responses. Just like someone narrating a commercial does not sound real, that’s the experience people are going to expect from their AI assistants. The next will be much more human-like voices meant to not sound like assistants. These will sound more like the voices used in the Google Duplex demo.
There will be debate on the morality of human sounding AI but we’ll be accepting of it if it is able to dramatically improve our lives. It likely will.