Reading Between the Lines

Image for post
Image for post

We can get a hint of what people want by observing their interaction with a system. With voice interaction, there are a few clues to how well users are able to navigate requests. Some metrics that can be used to infer interaction are:

  • Triggers that are not followed by requests
  • Repeated requests
  • Rephrased requests (same intent, different words)
  • Time between trigger and request
  • Time between requests
  • Length of request
  • Phrase density (word count divided by recording time or audio file size)

Some of these metrics can be available to Skills / Actions creators and some of them to those who implement Alexa Voice Service or Embedded Google SDK. With a proprietary voice interaction, this data can be completely captured.

More data on the interaction can be derived if we go more granular:

  • Loudness of the request (SNR or just microphone level after successful STT result)
  • Background noise during the request

If we layer on top other services, we can gain even more info:

  • Emotion detection
  • Age and gender of speaker
  • Music detection
  • Speaker recognition

If we identify these items, we can start to come up with correlations between users’ desires and what they get from the systems we build. We can then use the data to inform the systems’ responses.

Written by

Independent daily thoughts on all things future, voice technologies and AI. More at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store