Coming back from the thrill of conversations at SpeechTEK, I’m curious more about the possible applications that will come from combining light (vision) with sound (voice).
A quick review… we have speech-to-text, natural language understanding, emotion detection, and biometrics (identity).
We have identity, age, emotion, gender, ethnicity, gaze detection, lip reading, and gesture — even pulse, blood pressure, respiration rate and potentially other health information.
Combining the two could result in:
- Much better speech recognition
- Better identification of entities in natural language interaction (e.g. “Go over there” combined with gaze and finger tracking)
- Better context of requests
- Matching of prosody and emotion to that of the user
- Alerting emergency services of a potential health issue