Stack overflow might not be the best place to ask this question but i need help. I have an mp3 file and i want to use google's speech recognition to get the text out of that file. Any ideas where i can find documentation or examples will be appreciated.
Take a look at Google Cloud Speech API that enables developers to convert audio to text [...] The API recognizes over 80 languages and variants [...] You can create a free account to get a limited amount of API request.
HOW TO:
You need first to install gcloud python module & google-api-python-client module with:
pip install --upgrade gcloud
pip install --upgrade google-api-python-client
Then in the Cloud Platform Console, go to the Projects page and select or create a new project. After you need to enable billing for your project, then enable Cloud Speech API.
After enabling the Google Cloud Speech API, click the Go to Credentials button to set up your Cloud Speech API credentials
See Set Up a Service Account for information on how to authorize to the Cloud Speech API service from your code
You should obtain both a service account key file (in JSON) and a GOOGLE_APPLICATION_CREDENTIALS environment variable that will allow you to authenticate to the Speech API
Once all done, download the audio raw file from google and also the speech-discovery_google_rest_v1.json from google
Modify previous downloaded JSON file to set your credentials key then make sure that you have set your the GOOGLE_APPLICATION_CREDENTIALS environment variable to the full path of the .json file with:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_file.json
also
Make sure that you have set your GCLOUD_PROJECT environment variable to the ID of your Google Cloud project with :
export GCLOUD_PROJECT=your-project-id
assuming all done, you can create a tutorial.py file which contain:
import argparse
import base64
import json
from googleapiclient import discovery
import httplib2
from oauth2client.client import GoogleCredentials
DISCOVERY_URL = ('https://{api}.googleapis.com/$discovery/rest?'
'version={apiVersion}')
def get_speech_service():
credentials = GoogleCredentials.get_application_default().create_scoped(
['https://www.googleapis.com/auth/cloud-platform'])
http = httplib2.Http()
credentials.authorize(http)
return discovery.build(
'speech', 'v1beta1', http=http, discoveryServiceUrl=DISCOVERY_URL)
def main(speech_file):
"""Transcribe the given audio file.
Args:
speech_file: the name of the audio file.
"""
with open(speech_file, 'rb') as speech:
speech_content = base64.b64encode(speech.read())
service = get_speech_service()
service_request = service.speech().syncrecognize(
body={
'config': {
'encoding': 'LINEAR16', # raw 16-bit signed LE samples
'sampleRate': 16000, # 16 khz
'languageCode': 'en-US', # a BCP-47 language tag
},
'audio': {
'content': speech_content.decode('UTF-8')
}
})
response = service_request.execute()
print(json.dumps(response))
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument(
'speech_file', help='Full path of audio file to be recognized')
args = parser.parse_args()
main(args.speech_file)
Then run:
python tutorial.py audio.raw