Performance of wkhtmltopdf

rajadhi picture rajadhi · Jul 24, 2012 · Viewed 22.7k times · Source

We are intending to use wkhtmltopdf to convert html to pdf but we are concerned about the scalability of wkhtmltopdf. Does anyone have any idea how it scales? Our web app potentially could attempt to convert hundreds of thousands of (reletively complex)html so it's important for us to have some idea. Has anyone got any information on this?

Answer

Alistair Ronan picture Alistair Ronan · Jul 25, 2012

First of all, your question is quite general; there are many variables to consider when asking about scalability of any project. Obviously there is a difference between converting "hundreds of thousands" of HTML files over a week and expecting to do that in a day, or an hour. On top of that "relatively complex" HTML can mean different things to other people.

That being said, I figured since I have done something similar to this, converting approximately 450,000 html files, utilizing wkhtmltopdf; I'd share my experience.

Here was my scenario:

  • 450,000 HTML files
    • 95% of the files were one page in length
    • generally containing 2 images (relative path, local system)
    • tabular data (sometimes contained nested tables)
    • simple markup elsewhere (strong, italic, underline, etc)
  • A spare desktop PC
    • 8GB RAM
    • 2.4GHz Dual Core Processor
    • 7200RPM HD

I used a simple single threaded script written in PHP, to iterate over the folders and pass the html file path to wkhtmltopdf. The process took about 2.5 days to convert all the files, with very minimal errors.

I hope this gives you insight to what you can expect from utilizing wkhtmltopdf in your web application. Some obvious improvements would come from running this on better hardware but mainly from utilizing a multi-threaded application to process files simultaneously.