I'm currently working on a tool allowing me to read all my notifications thanks to the connection to different APIs.
It's working great, but now I would like to put some vocal commands to do some actions.
Like when the software is saying "One mail from Bob", I would like to say "Read it", or "Archive it".
My software is running through a node server, currently I don't have any browser implementation, but it can be a plan.
What is the best way in node JS to enable speech to text?
I've seen a lot of threads on it, but mainly it's using the browser and if possible, I would like to avoid that at the beginning. Is it possible?
Another issue is some software requires the input of a wav file. I don't have any file, I just want my software to be always listening to what I say to react when I say a command.
Do you have any information on how I could do that?
Cheers
To recognize few commands without streaming them to the server you can use node-pocketsphinx module. Available in NPM.
The code to recognize few commands in continuos stream should look like this:
var fs = require('fs');
var ps = require('pocketsphinx').ps;
modeldir = "../../pocketsphinx/model/en-us/"
var config = new ps.Decoder.defaultConfig();
config.setString("-hmm", modeldir + "en-us");
config.setString("-dict", modeldir + "cmudict-en-us.dict");
config.setString("-kws", "keyword list");
var decoder = new ps.Decoder(config);
fs.readFile("../../pocketsphinx/test/data/goforward.raw", function(err, data) {
if (err) throw err;
decoder.startUtt();
decoder.processRaw(data, false, false);
decoder.endUtt();
console.log(decoder.hyp())
});
Instead of readFile
you just read the data from microphone and pass it to recognizer. The list of keywords to detect should look like this:
read it /1e-20/
archive it /1e-20/
For more details on spotting with pocketsphinx see Keyword Spotting in Speech and Recognizing multiple keywords using PocketSphinx