Is it safe/supported to run multiple instances of Puppeteer at the same time, either at
node screenshot.js
at the same time) or puppeteer.launch()
at the same time)?What are the recommended settings/limits on parallel processes?
(In my tests, (1) seems to work fine, but I'm wondering about the reliability of Puppeteer's interactions with the single (?) instance of Chrome. I haven't tried (2) but that seems less likely to work out.)
It's fine to run multiple browser, contexts or even pages in parallel. The limits depend on your network/disk/memory and task setup.
I crawled a few million pages and from time to time (in my setup, every ~10,000 pages) puppeteer will crash. Therefore, you should have a way to auto-restart the browser and retry the job.
You might want to check out puppeteer-cluster, which takes care of pooling the browser instances, restarting and crash detection/restarting. (Disclaimer: I'm the author)
An example of a creation of a cluster is below:
// create a cluster that handles 10 parallel browsers
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_BROWSER,
maxConcurrency: 10,
});
// Queue your jobs (one example)
cluster.queue(async ({ page }) => {
await page.goto('http://www.wikipedia.org');
await page.screenshot({path: 'wikipedia.png'});
});
This is just a minimal example. There are many more ways to use the cluster.