I have been installing Pocketsphinx0.7 on a VM running Debian Squeeze. This worked fine and I can try to recognize speech from files.Having this I've built some python scripts which recognize a bunch of files I got and then estimating the word error rate. These use gstreamer as described in this tutorial.
So far I am using the original hmm which was in the pocketsphinx tarball, a dictionary which simply contains the words from my test data and an optimized language model I got from my professor. This should work as it is also running in a production system. My problem now is that the recognition performance is still horrible. I have an word error (WER) rate of about 85%.
What I want to know is how I can improve the WER. What kind of steps can I take?
Another thing that happens and probably impacts performance is that pocketsphinx tells me it has no permission to access the hmm although I made the hmm accessible for read,write and execute for everyone.
Does anyone have an idea where this may come from? I' appreciate any kind of help. If you need more information please let me know.
EDIT:
I created a small testset and ran pocketsphinx. This is where you can find the files and the results. I was allowed to give
you some examples from the original test set. You can find it here.
These are the worst examples. Short utterances of 1-2 words work well.
Sorry I couldn't create a big test set so far, my time is very limited.
What I want to know is how I can improve the WER. What kind of steps can I take?
This issue is described in Pocketsphinx FAQ:
http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor
The first step is to collect a database of test samples
If you need help to improve the accuracy, you need to share that database and results you are looking for and the actual results. You can share here or on Sourceforge forum. You need to pack all the files into archive an upload somewhere. Then you can give here a link.
For more information see