How do I play audio returned from an XMLHTTPRequest using the HTML5 Audio API

exiquio picture exiquio · May 19, 2015 · Viewed 13.1k times · Source

I'm failing to be able to play audio when making an "AJAX" request to my server side api.

I have backend Node.js code that's using IBM's Watson Text-to-Speech service to serve audio from text:

var render = function(request, response) {
    var options = {
        text: request.params.text,
        voice: 'VoiceEnUsMichael',
        accept: 'audio/ogg; codecs=opus'
    };

    synthesizeAndRender(options, request, response);
};

var synthesizeAndRender = function(options, request, response) {
    var synthesizedSpeech = textToSpeech.synthesize(options);

    synthesizedSpeech.on('response', function(eventResponse) {
        if(request.params.text.download) {
            var contentDisposition = 'attachment; filename=transcript.ogg';

            eventResponse.headers['content-disposition'] = contentDisposition;
        }
    });

    synthesizedSpeech.pipe(response);
};

I have client side code to handle that:

var xhr = new XMLHttpRequest(),
    audioContext = new AudioContext(),
    source = audioContext.createBufferSource();

module.controllers.TextToSpeechController = {
    fetch: function() {
        xhr.onload = function() {
            var playAudio = function(buffer) {
                source.buffer = buffer;
                source.connect(audioContext.destination);

                source.start(0);
            };

            // TODO: Handle properly (exiquio)
            // NOTE: error is being received
            var handleError = function(error) {
                console.log('An audio decoding error occurred');
            }

            audioContext
                .decodeAudioData(xhr.response, playAudio, handleError);
        };
        xhr.onerror = function() { console.log('An error occurred'); };

        var urlBase = 'http://localhost:3001/api/v1/text_to_speech/';
        var url = [
            urlBase,
            'test',
        ].join('');

        xhr.open('GET', encodeURI(url), true);
        xhr.setRequestHeader('x-access-token', Application.token);
        xhr.responseType = 'arraybuffer';
        xhr.send();
    }
}

The backend returns the audio that I expect, but my success method, playAudio, is never called. Instead, handleError is always called and the error object is always null.

Could anyone explain what I'm doing wrong and how to correct this? It would be greatly appreciated.

Thanks.

NOTE: The string "test" in the URL becomes a text param on the backend and and ends up in the options variable in synthesizeAndRender.

Answer

Eric S. Bullington picture Eric S. Bullington · May 25, 2015

Unfortunately, unlike Chrome's HTML5 Audio implementation, Chrome's Web Audio doesn't support audio/ogg;codecs=opus, which is what your request uses here. You need to set the format to audio/wav for this to work. To be sure it's passed through to the server request, I suggest putting it in the query string (accept=audio/wav, urlencoded).

Are you just looking to play the audio, or do you need access to the Web Audio API for audio transformation? If you just need to play the audio, I can show you how to easily play this with the HTML5 Audio API (not the Web Audio one). And with HTML5 Audio, you can stream it using the technique below, and you can use the optimal audio/ogg;codecs=opus format.

It's as simple as dynamically setting the source of your audio element, queried from the DOM via something like this:

(in HTML)

<audio id="myAudioElement" />

(in your JS)

var audio = document.getElementById('myAudioElement') || new Audio();
audio.src = yourUrl;

Your can also set the audio element's source via an XMLHttpRequest, but you won't get the streaming. But since you can use a POST method, you're not limited to the text length of a GET request (for this API, ~6KB). To set it in xhr, you create a data uri from a blob response:

    xhr.open('POST', encodeURI(url), true);
    xhr.setRequestHeader('Content-Type', 'application/json');
    xhr.responseType = 'blob';
    xhr.onload = function(evt) {
      var blob = new Blob([xhr.response], {type: 'audio/ogg'});
      var objectUrl = URL.createObjectURL(blob);
      audio.src = objectUrl;
      // Release resource when it's loaded
      audio.onload = function(evt) {
        URL.revokeObjectURL(objectUrl);
      };
      audio.play();
    };
    var data = JSON.stringify({text: yourTextToSynthesize});
    xhr.send(data);

As you can see, with XMLHttpRequest, you have to wait until the data are fully loaded to play. There may be a way to stream from XMLHttpRequest using the very new Media Source Extensions API, which is currently available only in Chrome and IE (no Firefox or Safari). This is an approach I'm currently experimenting with. I'll update here if I'm successful.