One of the interesting experiences in working with the Ubi and the Portal for creating custom commands was learning that shorter phrases actually led to more errors.
For example, the phrase “turn on” was more like to have an error than “please turn on the lights”. We had seen this over and over on different shorter phrases.
- Clipping of the audio. In our example, if speech capture happened to late, it’d be “urn on”.
- Harder to distinguish from noise. There is less signal (speech) compared to a longer phrase so more likely for a noise cancellation algorithm to filter properly.
- Fewer words for predictive text. For the phrase “the end is…” the most common word to follow is “near”, then “nigh” then some other long tail but there’s a statistical likelihood. The more words, the better the prediction.
The conclusion we reached: shorter commands are better served by trigger word engines rather than speech-to-text APIs.