ChromeDriver --print-to-pdf after page load

jankovd picture jankovd · Nov 20, 2017 · Viewed 9.1k times · Source

According to the docs, Chrome can be started in headless mode with --print-to-pdf in order to export a PDF of a web page. This works well for pages accessible with a GET request.

Trying to find a print-to-pdf solution that would allow me to export a PDF after executing multiple navigation request from within Chrome. Example: open google.com, input a search query, click the first result link, export to PDF.

Looking at the [very limited amount of available] docs and samples, I failed to find a way to instruct Chrome to export a PDF, after a page loads. I'm using the Java chrome-driver.

One possible solution not involving Chrome, is by using a tool like wkhtmltopdf. Going on this path would force me to - before sending the HTML to the tool - do the following:

  • save the HTML in a local file
  • traverse the DOM, and download all file links (images, js, css, etc)

Don't prefer this path as it would require a lot of tinkering [I assume] on my part to get downloads' file paths correct for wkhtmltopdf to read correctly.

Is there a way to instruct Chrome to print to PDF, but only after a page loads?

Answer

jankovd picture jankovd · Jan 23, 2018

As there are no answers, I will explain my workaround. Instead of trying to find how to request from Chrome to print the current page, I went down another route.

For this example we will try to download the results page from Google on the query 'example':

  1. Navigate with driver.get("google.com"), input the query 'example', click 'Google Search'
  2. Wait for the results page to load
  3. Retrieve the page source with driver.getPageSource()
  4. Parse source with e.g. Jsoup in order to remap all relative links to point to an endpoint defined for this purpose (explained below) - example to localhost:8080. Link './style.css' would become 'localhost:8080/style.css'
  5. Save HTML to a file, e.g. named 'query-example'
  6. Run chrome --print-to-pdf localhost:8080/search?id=query-example

What will happen is that chrome will request the HTML from our controller, and for resources defined in the HTML we return, it will go to our controller - since we remapped relative links - which will in turn forward that request to the real location of the resource - google.com. Below is an example Spring controller, and note that the example is incomplete and is here only as a guidance.

@RestController
@RequestMapping
public class InternationalOffloadRestController {
  @RequestMapping(method = RequestMethod.GET, value = "/search/html")
  public String getHtml(@RequestParam("id") String id) {
    File file = new File("location of the HTML file", id);
    try (FileInputStream input = new FileInputStream(file)) {
      return IOUtils.toString(input, HTML_ENCODING);
    }
  }
  @RequestMapping("/**") // forward all remapped links to google.com
  public void forward(HttpServletResponse httpServletResponse, ...) {
    URI uri = new URI("https", null, "google.com", -1, 
      request.getRequestURI(), request.getQueryString(), null);
    httpServletResponse.setHeader("Location", uri.toString());
    httpServletResponse.setStatus(HttpServletResponse.SC_MOVED_PERMANENTLY);
  }
}