Continuing on the theme of SSML because of our launch of the new feature for our Alexa Skill earlier this week, there are a few reasons why SSML matters in building voice interaction.
SSML and other speech synthesis markup tools represent a way for speech-to-text to go from robotic to natural. They are also the way these services will get closer to passing Turing tests and create better interactions.
While technologies like WaveNet will make STT services (at least from Google) much more real sounding, SSML will allow developers to add the inflection and correct pronunciations that are needed for these interaction to be convincing.
While SSML is recently supported by Alexa, it’s by no means exclusive to it. Other services offer their own implementation (Google, Microsoft, Nuance, IBM). What may be necessary for future SSML standards is the addition of emotion tags as STT providers become more able to add emotion to their services.