Getting pronunciation of a word using Google Translate API

user39664 picture user39664 · Jun 7, 2014 · Viewed 13.8k times · Source

I am trying to save the pronunciation of a French word into a .wav or .mp3 file.

I was wondering if there was anywhere on the Google Translate API (since it has a pronunciation functionality) that allows me to achieve this objective. Other libraries would work too.

Answer

Jesse Scherer picture Jesse Scherer · Mar 11, 2015

Since this question was asked, it's gotten much harder to "scrape" MP3s from Google Translate, but Google has (finally) set up a TTS API. Interestingly it is billed in input characters, with the first 1 or 4 million input characters per month being free (depending on whether you use WaveNet or old school voices)

Nowadays to do this using gcloud on the command line (versus building this into an app) you would do roughly as follows (I'm paraphrasing the TTS quick start). You need base64, curl, gcloud, and jq for this walkthrough.

  1. Create a project on the GCP console, or run something like gcloud projects create example-throwaway-tts
  2. Enable billing for the project. Do this even if you don't intend to exceed the freebie quota.
  3. Use the GCP console to enable the TTS API for the project you just set up.
  4. Use the console again, this time to make a new service account.
    • Use any old name
    • Don't give it a role. You'll get a warning. This is okay.
    • Select key type JSON if it isn't already selected
    • Click Create
    • Hold onto the JSON file that your browser downloads
  5. Set an environment variable to point at that file, e.g. export GOOGLE_APPLICATION_CREDENTIALS="~/Downloads/service-account-file.json"
  6. Get the appropriate access token:
    1. Tell gcloud to use that new project: gcloud config set project example-throwaway-tts
    2. Set a variable TTS_ACCESS_TOKEN=gcloud auth application-default print-access-token
  7. Put together a JSON request. I'll give an example below. For this example we'll call it request.json
  8. Lastly, run the following

     curl \
    -H "Authorization: Bearer "$TTS_ACCESS_TOKEN \
    -H "Content-Type: application/json; charset=utf-8" \
    --data-raw @request.json \
    "https://texttospeech.googleapis.com/v1/text:synthesize" \
    | jq '.audioContent' \
    | base64 --decode > very_simple_example.mp3
    

What this does is to

  • authenticate using the default access token for the project you set up
  • set the content type to JSON (so that jq can extract the payload)
  • use request.json as the data to send using curl's --data-raw flag
  • extract the value of audioContent from the response
  • base64 decode that content
  • save the whole mess as an MP3

Contents of request.json follow. You can see where to insert your desired text, adjust the voice or change output formats via audioConfig:

{
  'input':{
    'text':'very simple example'
  },
  'voice':{
    'languageCode':'en-gb',
    'name':'en-GB-Standard-A',
    'ssmlGender':'FEMALE'
  },
  'audioConfig':{
      'audioEncoding':'MP3'
  }
}

Original Answer

As Hugolpz alludes, if you know the word or phrase you want (via a previous Translate API call), you can get MP3s from a URL like http://translate.google.com/translate_tts?ie=UTF-8&q=Bonjour&tl=fr

Note that &tl=fr ensures that you get French instead of the default English.

You will need to rate-limit yourself, but if you're looking for a small number of words or phrases you should be fine.