Asynchronous Request Within a ForEach in node.js

Bannerman picture Bannerman · Nov 25, 2015 · Viewed 7.2k times · Source

I am new to node.js (and to request.js). I'd like to get the body of a website back from a specific url with different paths (in the example below http://www.example.com/path1, http://www.example.com/path2, etc.) and log this data in an object with a key/value mapping (siteData[path] below).

var request = require('request'),
    paths = ['path1','path2','path3'],
    siteData = {},
    pathLength = paths.length,
    pathIndex = 0;

paths.forEach((path) => {
    var url="http://www.example.com/"+path;
    request(url, function(error, response, html){
        if(!error){
            siteData[path] = response.body;
            pathIndex++;
            if(pathIndex===pathLength){
                someFunction(siteData);
            }
        }
});

function someFunction(data){
    //manipulate data
}

My questions are:

  • The if statement (index === length) doesn't look like the right way to determine if the asynchronous requests are finished. How should I properly check if the requests are finished?
  • When I execute the code above I get an error (node) warning: possible EventEmitter memory leak detected. 11 unpipe listeners added. Use emitter.setMaxListeners() to increase limit. I tried chaining request(url, function(...){}).setMaxListeners(100); but that didn't work.

Thanks for your help!

Answer

SomeKittens picture SomeKittens · Nov 25, 2015

Looks like Promises are the right tool to get the job done here. Instead of a callback, we'll create a new Promise object that will resolve when the job is done. We can say "once you're done, do some more stuff" with the .then operator:

var rp = require('request-promise');

rp('http://www.google.com')
  .then((htmlString) => {
    // Process html... 
  });

(if anything goes wrong, the promise rejects and goes straight to .catch)

someFunctionThatErrors('Yikes!')
  .then((data) => {
    // won't be called
  })
.catch((err) => {
  // Will be called, we handle the error here
});

We've got lots of async tasks to do, so just one promise won't work. One option is to string them all together in series, like so:

rp('http://www.google.com')
  .then((htmlString) => rp('http://someOtherUrl.com'))
  .then((otherHtmlString) => {
    // and so forth...

But that loses some of the awesome of async - we can do all of these tasks in parallel.

var myRequests = [];
myRequests.push(rp('http://www.google.com').then(processStuff).catch(handleErr));
myRequests.push(rp('http://someOtherUrl.com').then(processStuff).catch(handleErr));

...boy does that look ugly. There's a better way with all of this - Promise.all() (You're using arrow functions so I assume native Promise will work for you too). It takes an array of promises and returns a promise that resolves when all of the array's promises have finished executing. (If any of them error, it immediately rejects). The .then function will be given an array representing the value each promise resolved to.

var myRequests = [];
myRequests.push(rp('http://www.google.com'));
myRequests.push(rp('http://someOtherUrl.com'));
Promise.all(myRequests)
  .then((arrayOfHtml) => {
    // arrayOfHtml[0] is the results from google,
    // arrayOfHtml[1] is the results from someOtherUrl
    // ...etc
    arrayOfHtml.forEach(processStuff);
  })
  .catch(/* handle error */);

Still, we have to manually call .push for every link we want to hit. That won't do! Let's pull a nifty trick using Array.prototype.map which will iterate over our array, manipulating each value in turn and return a new array comprised of the new values:

var arrayOfPromises = paths.map((path) => rp(`http://www.example.com/${path}`));
Promise.all(arrayOfPromises)
  .then((arrayOfHtml) => arrayOfHtml.forEach(processStuff))
  .catch(function (err) { console.log('agh!'); });

Much cleaner and easier error handling.