Google speech API

Dheby Chan picture Dheby Chan · Oct 4, 2012 · Viewed 54.8k times · Source

I'm now working with my project and I'm about to build a Siri-like application for the desktop computer. I am thinking if Google Speech API is reliable and accurate for speech recognition? Can you suggest to me what speech API is the most accurate in terms of speech recognition? Most preferably a free API. Thank you.

Answer

Kevin Junghans picture Kevin Junghans · Oct 4, 2012

While the Google speech API is free it is not an official public API. Some people have reverse engineered it, as is discussed in this blog. If you are planning on accessing the API directly for a commercial product I would not recommend it because they can drop it or change it without warning, breaking your product. This recently happened to developers that used the Google Weather API. If you are accessing it through a Chrome browser using x-webkit-speech on the other hand you are probably safe since it is supported by Google. Google's speech recognition is right up there with a lot of the more popular commercial solutions. They have a lot of experience with it in other projects like Google Voice and the now defunct Google 411. They have some of the top speech scientists working for them. The only other free alternative I can think of is Sphinx which is an open source project out of Carnegie Mellon University. Steep learning curve using this solution and if you want it to be setup as a service you will have to develop that yourself. Nuance is the other big player in the speech recognition market (I believe that is what Siri uses) and they do have solutions that offer speech recognition as a service. But they are pricey.

Update on Answer From Comments on Language Support

Windows Speech Recognition supports other languages, as does most speech recognition systems. But the caveat is that you have to tell the system what language to use and it has to support the language in question. Each vendor has a list of languages it supports and they are specific to a region. For example a vendor may support Mexican Spanish, American Spanish and Spain Spanish; which all have slightly different dialects. But the speech recognition engine can only support one language/dialect at a timer per user. A user cannot speak multiple languages to a speech recognition system without first requesting it to change to that language.

Updated 3/17/2014

The x-webkit-speech input field is being deprecated due to lack of support in other browsers. This will be replaced with the Web Speech API, which is a javascript API. You can find an example on how to use it here.