Speech recognition, nodeJS

Vico picture Vico · Feb 26, 2016 · Viewed 16.2k times · Source

I'm currently working on a tool allowing me to read all my notifications thanks to the connection to different APIs.

It's working great, but now I would like to put some vocal commands to do some actions.

Like when the software is saying "One mail from Bob", I would like to say "Read it", or "Archive it".

My software is running through a node server, currently I don't have any browser implementation, but it can be a plan.

What is the best way in node JS to enable speech to text?

I've seen a lot of threads on it, but mainly it's using the browser and if possible, I would like to avoid that at the beginning. Is it possible?

Another issue is some software requires the input of a wav file. I don't have any file, I just want my software to be always listening to what I say to react when I say a command.

Do you have any information on how I could do that?

Cheers

Answer

Nikolay Shmyrev picture Nikolay Shmyrev · Feb 26, 2016

To recognize few commands without streaming them to the server you can use node-pocketsphinx module. Available in NPM.

The code to recognize few commands in continuos stream should look like this:

var fs = require('fs');

var ps = require('pocketsphinx').ps;

modeldir = "../../pocketsphinx/model/en-us/"

var config = new ps.Decoder.defaultConfig();
config.setString("-hmm", modeldir + "en-us");
config.setString("-dict", modeldir + "cmudict-en-us.dict");
config.setString("-kws", "keyword list");
var decoder = new ps.Decoder(config);

fs.readFile("../../pocketsphinx/test/data/goforward.raw", function(err, data) {
    if (err) throw err;
    decoder.startUtt();
    decoder.processRaw(data, false, false);
    decoder.endUtt();
    console.log(decoder.hyp())
});

Instead of readFile you just read the data from microphone and pass it to recognizer. The list of keywords to detect should look like this:

read it /1e-20/
archive it /1e-20/

For more details on spotting with pocketsphinx see Keyword Spotting in Speech and Recognizing multiple keywords using PocketSphinx