Speech recognition is tough. What’s weird is that good enough is usually not good enough. Even at 90% plus accuracy, reading a speech transcription can be awful. Little things like where a comma is placed can have a big impact on meaning.
Worse, is that when these results are used as the basis for other analysis such as personality or emotion, it can compound errors.
There are many things that can result in poor transcription.
Audio quality is the biggest factor. Noise, echo, delay, compression, etc. can all cause worse performance.
Training can only go so far if the quality is low, however, many technologies can be employed to improve the quality. Training speech recognition on poor audio samples and having feedback loops in place for continuous improvement are the factors that can turn initially poor speech recognition into something amazing in a short period of time.