Android: Voice recognition

Ramesh Sangili picture Ramesh Sangili · Dec 25, 2012 · Viewed 10.8k times · Source

[possibly duplicate] But I didn't find answers to my questions below.

I've been doing some research on voice recognition for the past two days and I didn't get answers to my questions:

  1. Is it possible to run voice recognition as a service? I would like to implement something like this: I need to call a number though my phone through voice recognition is in sleep mode.
  2. Does voice recognition work properly to detect the words when I am on a train, bus, etc?
  3. Is there any sensor to detect the voice apart from the voice recognition?
  4. For voice recognition to work properly, does the user need to speak closer to the phone?

Answer

MP23 picture MP23 · Dec 25, 2012

1) It is proper approach to put voice recognition into a service, like it is made in Google api, where callback methods are used to get results. To make it run continously, service must deal with wakelock that will avoid falling in sleep mode. Some more information is provided here Wake locks android service recurring It has one big disadvantage - high battery usage, cause by continuous work of CPU and coninuous computations of incoming sound data. (Can be reduced with filters, thresholds etc.)

2) Voice recognition is not a simple task. It desires huge number of calculation and data to reference to. If input audio is not clear (noise, many human voices etc.), it is harder to get proper output. What can be done to make accuracy better is, filter input audio: noise suppresion, low pass filter etc. You cannot expect 100% accuracy, but 80-95 % can be achieved.

Harder is to filter many human voices. But there can be used some simple amplitude (audio strength level) algorithms with adaptive threshold that decides when word begins and ends. Idea is that the proper voice is the loudest = nearest to phone/device. So according to 4) accuracy is better when user speak close to microphone, because it is the loudest voice.

3) I dont know what you mean by sensor, but there are algorithms to simply detect human voice rather that decode words. These algorithms are called Voice Activity Detection (VAD) Some code should be found in Speex project documentation http://www.speex.org/

Simplest method to handle voice recognition is to use Google Speech api wich is pretty good, and it recognize plenty of languages but need an Internet connection - and it takes a while to get result.
Faster is CMU Sphinx but it has few language models, needs more RAM memory and proccesor computation since all decoding is made on device. In my opininon it very good when dicitionary (words that are revognized) is small like commands (left,right, backward, stop, start, etc).