Voice recognition on android with recorded sound clip?

CodeFusionMobile picture CodeFusionMobile · Feb 23, 2010 · Viewed 31.9k times · Source

I've used the voice recognition feature on Android and I love it. It's one of my customers' most praised features. However, the format is somewhat restrictive. You have to call the recognizer intent, have it send the recording for transcription to google, and wait for the text back.

Some of my ideas would require recording the audio within my app and then sending the clip to google for transcription.

Is there any way I can send an audio clip to be processed with speech to text?

Answer

lsantsan picture lsantsan · Apr 17, 2014

I got a solution that is working well to have speech recognizing and audio recording. Here is the link to a simple Android project I created to show the solution's working. Also, I put some print screens inside the project to illustrate the app.

I'm gonna try to explain briefly the approach I used. I combined two features in that project: Google Speech API and Flac recording.

Google Speech API is called through HTTP connections. Mike Pultz gives more details about the API:

"(...) the new [Google] API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio."

However, this API needs to receive a FLAC sound file to work properly. That makes us to go to the second part: Flac recording

I implemented Flac recording in that project through extracting and adapting some pieces of code and libraries from an open source app called AudioBoo. AudioBoo uses native code to record and play flac format.

Thus, it's possible to record a flac sound, send it to Google Speech API, get the text, and play the sound that was just recorded.

The project I created has the basic principles to make it work and can be improved for specific situations. In order to make it work in a different scenario, it's necessary to get a Google Speech API key, which is obtained by being part of Google Chromium-dev group. I left one key in that project just to show it's working, but I'll remove it eventually. If someone needs more information about it, let me know cause I'm not able to put more than 2 links in this post.