Programmatically capturing AJAX traffic with headless Chrome

Andrei picture Andrei · Sep 6, 2017 · Viewed 8k times · Source

Chrome officially supports running the browser in headless mode (including programmatic control via the Puppeteer API and/or the CRI library).

I've searched through the documentation, but I haven't found how to programmatically capture the AJAX traffic from the instances (ie. start an instance of Chrome from code, navigate to a page, and access the background response/request calls & raw data (all from code not using the developer tools or extensions).

Do you have any suggestions or examples detailing how this could be achieved? Thanks!

Answer

ebidel picture ebidel · Sep 7, 2017

Update

As @Alejandro pointed out in the comment, resourceType is a function and the return value is lowercased

page.on('request', request => {
    if (request.resourceType() === 'xhr')
    // do something
});

Original answer

Puppeteer's API makes this really easy:

page.on('request', request => {
  if (request.resourceType === 'XHR')
    // do something
});

You can also intercept requests with setRequestInterception, but it's not needed in this example if you're not going to modify the requests.

There's an example of intercepting image requests that you can adapt.

resourceTypes are defined here.