Chrome Speech Synthesis with longer texts

Andrey Shchekin picture Andrey Shchekin · Feb 22, 2014 · Viewed 26.1k times · Source

I am getting a problem when trying to use Speech Synthesis API in Chrome 33. It works perfectly with a shorter text, but if I try longer text, it just stops in the middle. After it has stopped once like that, the Speech Synthesis does not work anywhere within Chrome until the browser is restarted.

Example code (http://jsfiddle.net/Mdm47/1/):

function speak(text) {
    var msg = new SpeechSynthesisUtterance();
    var voices = speechSynthesis.getVoices();
    msg.voice = voices[10];
    msg.voiceURI = 'native';
    msg.volume = 1;
    msg.rate = 1;
    msg.pitch = 2;
    msg.text = text;
    msg.lang = 'en-US';

    speechSynthesis.speak(msg);
}

speak('Short text');
speak('Collaboratively administrate empowered markets via plug-and-play networks. Dynamically procrastinate B2C users after installed base benefits. Dramatically visualize customer directed convergence without revolutionary ROI. Efficiently unleash cross-media information without cross-media value. Quickly maximize timely deliverables for real-time schemas. Dramatically maintain clicks-and-mortar solutions without functional solutions.');
speak('Another short text');

It stops speaking in the middle of the second text, and I can't get any other page to speak after that.

Is it a browser bug or some kind of security limitation?

Answer

Peter Woolley picture Peter Woolley · May 22, 2014

I've had this issue for a while now with Google Chrome Speech Synthesis. After some investigation, I discovered the following:

  • The breaking of the utterances only happens when the voice is not a native voice,
  • The cutting out usually occurs between 200-300 characters,
  • When it does break you can un-freeze it by doing speechSynthesis.cancel();
  • The 'onend' event sometimes decides not to fire. A quirky work-around to this is to console.log() out the utterance object before speaking it. Also I found wrapping the speak invocation in a setTimeout callback helps smooth these issues out.

In response to these problems, I have written a function that overcomes the character limit, by chunking the text up into smaller utterances, and playing them one after another. Obviously you'll get some odd sounds sometimes as sentences might be chunked into two separate utterances with a small time delay in between each, however the code will try and split these points at punctuation marks as to make the breaks in sound less obvious.

Update

I've made this work-around publicly available at https://gist.github.com/woollsta/2d146f13878a301b36d7#file-chunkify-js. Many thanks to Brett Zamir for his contributions.

The function:

var speechUtteranceChunker = function (utt, settings, callback) {
    settings = settings || {};
    var newUtt;
    var txt = (settings && settings.offset !== undefined ? utt.text.substring(settings.offset) : utt.text);
    if (utt.voice && utt.voice.voiceURI === 'native') { // Not part of the spec
        newUtt = utt;
        newUtt.text = txt;
        newUtt.addEventListener('end', function () {
            if (speechUtteranceChunker.cancel) {
                speechUtteranceChunker.cancel = false;
            }
            if (callback !== undefined) {
                callback();
            }
        });
    }
    else {
        var chunkLength = (settings && settings.chunkLength) || 160;
        var pattRegex = new RegExp('^[\\s\\S]{' + Math.floor(chunkLength / 2) + ',' + chunkLength + '}[.!?,]{1}|^[\\s\\S]{1,' + chunkLength + '}$|^[\\s\\S]{1,' + chunkLength + '} ');
        var chunkArr = txt.match(pattRegex);

        if (chunkArr[0] === undefined || chunkArr[0].length <= 2) {
            //call once all text has been spoken...
            if (callback !== undefined) {
                callback();
            }
            return;
        }
        var chunk = chunkArr[0];
        newUtt = new SpeechSynthesisUtterance(chunk);
        var x;
        for (x in utt) {
            if (utt.hasOwnProperty(x) && x !== 'text') {
                newUtt[x] = utt[x];
            }
        }
        newUtt.addEventListener('end', function () {
            if (speechUtteranceChunker.cancel) {
                speechUtteranceChunker.cancel = false;
                return;
            }
            settings.offset = settings.offset || 0;
            settings.offset += chunk.length - 1;
            speechUtteranceChunker(utt, settings, callback);
        });
    }

    if (settings.modifier) {
        settings.modifier(newUtt);
    }
    console.log(newUtt); //IMPORTANT!! Do not remove: Logging the object out fixes some onend firing issues.
    //placing the speak invocation inside a callback fixes ordering and onend issues.
    setTimeout(function () {
        speechSynthesis.speak(newUtt);
    }, 0);
};

How to use it...

//create an utterance as you normally would...
var myLongText = "This is some long text, oh my goodness look how long I'm getting, wooooohooo!";

var utterance = new SpeechSynthesisUtterance(myLongText);

//modify it as you normally would
var voiceArr = speechSynthesis.getVoices();
utterance.voice = voiceArr[2];

//pass it into the chunking function to have it played out.
//you can set the max number of characters by changing the chunkLength property below.
//a callback function can also be added that will fire once the entire text has been spoken.
speechUtteranceChunker(utterance, {
    chunkLength: 120
}, function () {
    //some code to execute when done
    console.log('done');
});

Hope people find this as useful.