Google Speech Recognition API Result is Empty

Bruno picture Bruno · Aug 11, 2016 · Viewed 8.7k times · Source

I'm performing an asynchronous request to Google Cloud Speech API, and I do not know how to get the result of operation:

Request POST: https://speech.googleapis.com/v1beta1/speech:asyncrecognize

Body:

{
    "config":{
                 "languageCode" : "pt-BR",
                 "encoding" : "LINEAR16",
                 "sampleRate" : 16000
             },
     "audio":{
                 "uri":"gs://bucket/audio.flac"
             }
}

Which returns:

{ "name": "469432517" }

So, I do a POST: https://speech.googleapis.com/v1beta1/operations/469432517

Which returns:

{
    "name": "469432517",
    "metadata": {
        "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata",
        "progressPercent": 100,
        "startTime": "2016-08-11T21:18:29.985053Z",
        "lastUpdateTime": "2016-08-11T21:18:31.888412Z"
    },
    "done": true,
    "response": {
                    "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
                }
}

I need to get the result of the operation: the transcribed text.

How can I do that?

Answer

Nikolay Shmyrev picture Nikolay Shmyrev · Aug 12, 2016

You've got the result of the operation and it is empty. The reason of the empty result is format mismatch. You should have submitted "LINEAR16" file (PCM uncompressed data, basically WAV file) and you try to submit FLAC (compressed format).

Other reason of the empty result might be incorrect sample rate, incorrect number of channels and so on.

Last, the file with pure silence will result in empty response.