When Amazon bought Ivona many years ago, the biggest speculation was that it was going to use the company to help narrate Kindle books. Now, this technology is being used for Amazon Polly and very likely was behind the voice of Alexa.
Long form narration via TTS is a bad idea, at least with today’s TTS APIs. You can hear this is a lot in mass produced YouTube videos. Someone will quickly use a TTS engine to add narration because they don’t like their own accent or are trepidatious about being the star of their own video.
However, TTS today is still in the uncanny valley and listening to any of these voices for more than a sentence or two is aggravating. Even very good TTS’s like WaveNet might yield a problem — the expression is off. The solution to this is for the author to go through the text and add SSML code. However, this is tedious.
So what could be the solution? A technology that reads the text for tone and can then automatically generate SSML code for the TTS.