Can CMU Sphinx be set up to recognize ~200 words

lots_of_questions picture lots_of_questions · Jan 31, 2012 · Viewed 8.1k times · Source

I have a client who needs an Android App that can recognize spoken commands. From what I understand the built-in voice to text functionality actually sends data to Google's servers which then sends back a text translation. This is a major problem, as the voice data is extremely sensitive (unless if the data is encrypted when it is sent to and from Google - but I doubt it is encrypted).

There are 2 options that I can think of. First is to convert speech-to-text on the Android, though this seems like it would be an extremely expensive operation. The second possibility is to have a local server convert the data for me (I could encrypt the voice data and the translation when it is being sent to and from). Is this something CMU Sphinx could pull off? It may be worth noting that I will also have access to an Asterisk server, which could possibly assist with this (I don't know).

In reality, there should only be ~200 words which will need to be recognized. I would prefer opensource/free software solutions however I am also open to a commercial solution (perhaps FlexT9). Ideally, I can send the audio stream somewhere, get back a String which is the text, and I can then parse and do other things with the String.

I haven't done much android or any speech recognition development in the past, so I'm hoping someone can at least point me in the right direction. Thanks!

Answer

Nikolay Shmyrev picture Nikolay Shmyrev · Jan 31, 2012

CMUSphinx is an open source speech recognition toolkit you can use to build your application. It contains tools, libraries and data which will enable you to build a speech application. You can learn more about CMUSphinx on the website above.

On Android you have several options to use CMUSphinx:

  1. Recognize audio on the device. For that you can compile Pocketsphinx engine for android. For details see this blog post.

  2. Recognize audio on server. As a server you can use either Pocketsphinx or Sphinx4. You can send audio in compressed flac format or extract speech recognition features on device and send feature stream to the server.

CMUSphinx provides you several acoustic models which will enable you to recognize audio in several languages like English, French, Mandarin, German, Dutch, Russian.

You can also improve the recognition result with adaptation tools.

If you have any questions on CMUSphinx you are welcome to ask in our community forums.