Amazon announced today a new newscaster style voice to Amazon Polly. While Google has been working on great, lifelike voice synthesis with Wavenet and Tacotron2 projects, Amazon’s announcements makes me think that Google was barking up the wrong tree.
The real challenge with speech synthesis is that it’s hard to tolerate flat, emotionless voice for anything longer that a quick response. Getting information read out with these voices is painful. Being lifelike doesn’t necessarily make the experience better. It just means listening to a real person speak flatly and emotionlessly back to you.
The newscaster voice mimics NPR-style news reading. It adds a cadence and prosody that makes it tolerable for listening to longer text.
What’s weird is that announcer voices are more fake sounding but in this case, make the voice sound more real.