I have been using Google Speech Recognition for Python. Here is my code:
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = r.listen(source)
print(r.recognize_google(audio))
Although the recognition is very accurate, it takes about 4-5 seconds before it spits out the recognized text. Since I am creating a voice assistant, I want to modify the above code to allow speech recognition to be much faster.
Is there any way we can lower this number to about 1-2 seconds? If possible, I am trying to make recognition as fast as services such as Siri and Ok Google.
I am very new to python, so my apologies if there is a simple answer to my question.
You could use another speech recognition program. For example, you could set up an account with IBM to use their Watson Speech To Text. If possible, try and use their websocket interface, because then it actively transcribes what you are saying while you are still speaking.
An example (not using websockets) would be:
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Adjusting for background noise. One second")
r.adjust_for_ambient_noise(source)
print("Say something!")
audio = r.listen(source)
IBM_USERNAME = "INSERT IBM SPEECH TO TEXT USERNAME HERE" # IBM Speech to Text usernames are strings of the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
IBM_PASSWORD = "INSERT IBM SPEECH TO TEXT PASSWORD HERE" # IBM Speech to Text passwords are mixed-case alphanumeric strings
try:
print("IBM Speech to Text thinks you said " + r.recognize_ibm(audio, username=IBM_USERNAME, password=IBM_PASSWORD))
except sr.UnknownValueError:
print("IBM Speech to Text could not understand audio")
except sr.RequestError as e:
print("Could not request results from IBM Speech to Text service; {0}".format(e))
You could also attempt using pocketsphinx, but personally, I have not had particularly good experiences with it. It is offline (a plus) but, for me, wasn't particularly accurate. You could probably tweak with some detection settings and cancel out some background noise. I believe there is also a training option to get it modified to your voice, but it doesn't look straightforward.
Some useful links:
Microphone recognition example
Good luck. Once speech recognition works correctly, it is very useful and rewarding!