Speech to Text (Voice Recognition) Directly from Audio / Transcription

user2330237 picture user2330237 · May 25, 2014 · Viewed 19.2k times · Source

Need to be able to convert or transcribe audio (eg from .MP3, other audio format) containing speech into text transcripts using a speech to text (voice recognition) algorithm with high accuracy. There are many available ways of doing this that are increasingly accurate but are designed for speech spoken into the device microphone (e.g. the Google Translate/corresponding API for web, Dragon app for iOS). I need a way to directly feed an audio file into the speech recognition engine/API. Don't want to play the audio through a speaker and capture it with a microphone -- takes considerable time for long audio files, and degrades audio quality and resulting transcription quality. Does a web service, or API, or code for this exist? Is there some kind of a wrapper around one of the existing services that presume that the microphone will be the source?

Thanks

Answer

user2330237 picture user2330237 · Feb 10, 2017

There is now a relatively new service that allows Speech to Text automatic transcription, and a great web interface for human editing of the results. It's:

https://trint.com/

We've used it, and been pleased with the results. The transcription is certainly not perfect, but it's a great start, and it allows ready human editing.

There is also now a new API and service available from IBM Bluemix/Watson. You can try the free demo here:

https://speech-to-text-demo.mybluemix.net/

This service does a pretty decent job of converting audio (sourced from the mic or from an audio file) into text. Currently at least in the demo it appears that it doesn't use MP3, but will use wav and other formats. This service has a full API, and it is primarily designed to be built into applications.