Speech Recognition in Other Languages

I focus a lot on English speech recognition. After all, Alexa and Google Home were marketed in the US first and US/Canada have a very “US is the center of the universe” viewpoint.

Things get interesting when you move voice interaction to other languages. Chinese has more words than English and more dialects, which means it likely requires the number of speech samples to achieve the same error rate. Let’s say over 300,000 words vs English with ~180,000 words. The same is when it comes to predictive typing or swipe typing.

One of the things that struck me recently was the high accuracy of Google speech recognition… in Hebrew. I have a non-Israeli accent and make a lot of speaking errors, but the voice transcription required fewer corrections than in English. As well, predictive typing seemed to be much more likely to suggest the next word.

Hebrew has a total word count of ~45,000 words and those can only be spoken is so many combinations. As well, conjugation can make the next word more predictable.

What this information can mean is that the work to re-launch voice first products to new regions may be a factor of the total vocabulary size.

