How to scrape JSON from puppeteer?

Amy Coin picture Amy Coin · Jan 29, 2018 · Viewed 14.3k times · Source

I login to a site and it gives a browser cookie.

I go to a URL and it is a json response.

How do I scrape the page after entering await page.goto('blahblahblah.json'); ?

Answer

Rippo picture Rippo · Jan 30, 2018

Another way which doesn't give you intermittent issues is to evaluate the body when it becomes available and return it as JSON e.g.

const puppeteer = require('puppeteer'); 

async function run() {

    const browser = await puppeteer.launch( {
        headless: false  //change to true in prod!
    }); 

    const page = await browser.newPage(); 

    await page.goto('https://raw.githubusercontent.com/GoogleChrome/puppeteer/master/package.json');

    var content = await page.content(); 

    innerText = await page.evaluate(() =>  {
        return JSON.parse(document.querySelector("body").innerText); 
    }); 

    console.log("innerText now contains the JSON");
    console.log(innerText);

    //I will leave this as an excercise for you to
    //  write out to FS...

    await browser.close(); 

};

run();