Image for post
Image for post

It’s good to remind ourselves that there’s a ton of information that’s conveyed through voice that we don’t yet fully take advantage of in the current interactions being created. Right now, we mostly just take the text data and then from there, the sentiment, intent, and entities, but there’s much more.

  • Rate of speech
  • Volume
  • Word emphasis
  • Emotion
  • Voice sentiment
  • Biometrics
  • Health information

Many of these can fall under the category of prosody. Since prosody is already modifiable through text to speech, it might be possible to reverse this and map speech to text responses with a prosody map like SSML tagging.

The bigger question is what will we do as voice designers with this information? Will we have the ability to modify our content with more inputs?

Written by

Independent daily thoughts on all things future, voice technologies and AI. More at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store