Crawling multiple URL in a loop using puppeteer

ahhmarr picture ahhmarr · Sep 19, 2017 · Viewed 14.4k times · Source

I have

urls = ['url','url','url'...]

this is what I'm doing

urls.map(async (url)=>{
  await page.goto(url);
  await page.waitForNavigation({ waitUntil: 'networkidle' });
})

This seems to not wait for page load and visit all the urls quite rapidly(i even tried using page.waitFor )

just wanted to know am I doing something fundamentally wrong or this type of functionality is not advised/supported

Answer

tomahaug picture tomahaug · Sep 19, 2017

map, forEach, reduce, etc, does not wait for the asynchronous operation within them, before they proceed to the next element of the iterator they are iterating over.

There are multiple ways of going through each item of an iterator synchronously while performing an asynchronous operation, but the easiest in this case I think would be to simply use a normal for operator, which does wait for the operation to finish.

const urls = [...]

for (let i = 0; i < urls.length; i++) {
    const url = urls[i];
    await page.goto(`${url}`);
    await page.waitForNavigation({ waitUntil: 'networkidle2' });
}

This would visit one url after another, as you are expecting. If you are curious about iterating serially using await/async, you can have a peek at this answer: https://stackoverflow.com/a/24586168/791691