How can I capture all network requests and full response data when loading a page in Chrome?

Matt Zeunert picture Matt Zeunert · Oct 24, 2018 · Viewed 17.3k times · Source

Using Puppeteer, I'd like to load a URL in Chrome and capture the following information:

  • request URL
  • request headers
  • request post data
  • response headers text (including duplicate headers like set-cookie)
  • transferred response size (i.e. compressed size)
  • full response body

Capturing the full response body is what causes the problems for me.

Things I've tried:

  • Getting response content with response.buffer - this does not work if there are redirects at any point, since buffers are wiped on navigation
  • intercepting requests and using getResponseBodyForInterception - this means I can no longer access the encodedLength, and I also had problems getting the correct request and response headers in some cases
  • Using a local proxy works, but this slowed down page load times significantly (and also changed some behavior for e.g. certificate errors)

Ideally the solution should only have a minor performance impact and have no functional differences from loading a page normally. I would also like to avoid forking Chrome.

Answer

Grant Miller picture Grant Miller · Oct 27, 2018

You can enable a request interception with page.setRequestInterception() for each request, and then, inside page.on('request'), you can use the request-promise-native module to act as a middle man to gather the response data before continuing the request with request.continue() in Puppeteer.

Here's a full working example:

'use strict';

const puppeteer = require('puppeteer');
const request_client = require('request-promise-native');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  const result = [];

  await page.setRequestInterception(true);

  page.on('request', request => {
    request_client({
      uri: request.url(),
      resolveWithFullResponse: true,
    }).then(response => {
      const request_url = request.url();
      const request_headers = request.headers();
      const request_post_data = request.postData();
      const response_headers = response.headers;
      const response_size = response_headers['content-length'];
      const response_body = response.body;

      result.push({
        request_url,
        request_headers,
        request_post_data,
        response_headers,
        response_size,
        response_body,
      });

      console.log(result);
      request.continue();
    }).catch(error => {
      console.error(error);
      request.abort();
    });
  });

  await page.goto('https://example.com/', {
    waitUntil: 'networkidle0',
  });

  await browser.close();
})();