Saving audio input of Android Stock speech recognition engine

mmmx picture mmmx · Dec 21, 2011 · Viewed 7.8k times · Source

I am trying to save in a file the audio data listened by speech recognition service of android.

Actually I implement RecognitionListener as explained here: Speech to Text on Android

save the data into a buffer as illustrated here: Capturing audio sent to Google's speech recognition server

and write the buffer to a Wav file, as in here. Android Record raw bytes into WAVE file for Http Streaming

My problem is how to get appropriate audio settings to save in the wav file's headers. In fact when I play the wav file only hear strange noise, with this parameters,

short nChannels=2;// audio channels
int sRate=44100;    // Sample rate
short bSamples = 16;// byteSample

or nothing with this:

short nChannels=1;// audio channels
int sRate=8000;    // Sample rate
short bSamples = 16;// byteSample

What is confusing is that looking at parameters of the speech recognition task from logcat I find first Set PLAYBACK sample rate to 44100 HZ:

    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK PCM format to S16_LE (Signed 16 bit Little Endian)
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Using 2 channels for PLAYBACK.
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK sample rate to 44100 HZ
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Buffer size: 2048
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Latency: 46439

and then aInfo.SampleRate = 8000 when it plays the file to send to google server:

    12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::InitWavParser
12-20 14:41:36.152: DEBUG/(2364): File open Succes
12-20 14:41:36.152: DEBUG/(2364): File SEEK End Succes
...
12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData
12-20 14:41:36.152: DEBUG/(2364): Data Read buff = RIFF?
12-20 14:41:36.152: DEBUG/(2364): Data Read = RIFF?
12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData
12-20 14:41:36.152: DEBUG/(2364): Data Read buff = fmt 
...
12-20 14:41:36.152: DEBUG/(2364): PVWAVPARSER_OK
12-20 14:41:36.156: DEBUG/(2364): aInfo.AudioFormat = 1
12-20 14:41:36.156: DEBUG/(2364): aInfo.NumChannels = 1
12-20 14:41:36.156: DEBUG/(2364): aInfo.SampleRate = 8000
12-20 14:41:36.156: DEBUG/(2364): aInfo.ByteRate = 16000
12-20 14:41:36.156: DEBUG/(2364): aInfo.BlockAlign = 2
12-20 14:41:36.156: DEBUG/(2364): aInfo.BitsPerSample = 16
12-20 14:41:36.156: DEBUG/(2364): aInfo.BytesPerSample = 2
12-20 14:41:36.156: DEBUG/(2364): aInfo.NumSamples = 2258

So, how can I find out the right parameters to save the audio buffer in a good wav audio file?

Answer

Malcolm Smith picture Malcolm Smith · May 29, 2012

You haven't included your code for actually writing out the PCM data, so its hard to diagnose, but if you are hearing strange noises then it looks most likely you have the wrong endian when you are writing the data, or the wrong number of channels. Getting the sample rate wrong will only result in the audio sounding slower or faster, but if it sounds completely garbled it is probably either a mistake in specifying the number of channels or endianess of your byte stream.

To know for sure, just stream your bytes directly to a file without any header (raw PCM data). This way you can rule out any errors when writing your file header. Then use Audacity to import the raw data, experimenting with the different options (bit depth, endian, channels) until you get an audio file that sounds correct (only one will be right). You do this from File->Import->Raw Data...

Once you have identified your byte format this way you only have to worry about whether you are setting the headers correctly. You might want to refer to this reference http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html for the file format. Or see the following links on existing Java solutions on writing audio files, Java - reading, manipulating and writing WAV files , or FMJ. Although I guess these might not be usable on Android.

If you are having to roll your own WAV/RIFF writer remember Java's data types are big-endian so any multi-byte primitives you write to your file must be written in reverse byte order to match RIFF's little-endianess.