How to use the browser's (chrome/firefox) HTML/CSS/JS rendering engine to produce PDF?

David Hofmann picture David Hofmann · Aug 29, 2014 · Viewed 13.5k times · Source

There are nice projects that generate pdf from html/css/js files

  1. http://wkhtmltopdf.org/ (open source)
  2. https://code.google.com/p/flying-saucer/ (open source)
  3. http://cssbox.sourceforge.net/ (not necessarily straight pdf generation)
  4. http://phantomjs.org/ (open source allows for pdf output)
  5. http://www.princexml.com/ (comercial but hands down the best one out there)
  6. https://thepdfapi.com/ a chrome modification to spit pdf from html from

I want to programatically control chrome or firefox browser (because they both are cross platform) to make them load a web page, run the scripts and style the page and generate a pdf file for printing.

But how do I start by controlling the browser in an automated way so that I can do something like

render-to-pdf file-to-render.html out.pdf

I can easily make this job manually by browsing the page and then printing it to pdf and I get an accurate, 100% spec compliant rendered html/css/js page on a pdf file. Even the url headers can be omitted in the pdf through configuration options in the browser. But again, how do I start in trying to automate this process?

I want to automate in the server side, the opening of the browser, navigating to a page, and generating the pdf using the browser rendered page.

I have done a lot of research I just don't know how to make the right question. I want to programatically control the browser, maybe like selenium does but to the point where I export a webpage as PDF (hence using the rendering capabilities of the browser to produce good pdfs)

Answer

crodas picture crodas · Aug 30, 2014

I'm not an expert but PhamtomJS seems to be the right tool for the job. I'm not sure though about what headless browser it uses underneath (I guess it is chrome/chromium)

var page = require('webpage').create();
page.open('http://github.com/', function() {
     var s = page.evaluate(function() {
         var body = document.body,
             html = document.documentElement;

        var height = Math.max( body.scrollHeight, body.offsetHeight, 
            html.clientHeight, html.scrollHeight, html.offsetHeight );
        var width = Math.max( body.scrollWidth, body.offsetWidth, 
            html.clientWidth, html.scrollWidth, html.offsetWidth );
        return {width: width, height: height}
    });

    console.log(JSON.stringify(s));

    // so it fit ins a single page
    page.paperSize = {
        width: "1980px",
        height: s.height + "px",
        margin: {
            top: '50px',
            left: '20px'
        }
    };

    page.render('github.pdf');
    phantom.exit();
});

Hope it helps.