How to convert an HTML file to PDF using wkhtmlpdf in Java

Surajeet Bharati picture Surajeet Bharati · Feb 10, 2015 · Viewed 9.1k times · Source

I want to convert an HTML file into a PDF file using wkhtmltopdf. wkhtmltopdf is the best option for me as it renders the HTML file using WebKit. The problem is that I want to do the same using Java but wkhtmltopdf does not provide any Java API.

I can use Runtime.exec() or ProcessBuilder to fork a new process from Java and create the PDF output using wkhtmtopdf in that process. But, as I am developing a web based application, I am not allowed to create so many new processes in the server.

Is there any other way so that I can use wkhtmltopdf? I really want to use it as it's giving me the exact output.

Or, is there any other open source browser engine that provides a Java API that can render my HTML page just like wkhtmltopdf?

Answer

njfife picture njfife · Jun 22, 2017

Remember that the system running your Java Code must have wkhtmltopdf installed for anything I'm saying here to work... go to www.wkhtmltopdf.org and download the version you need.

I know this is old and by now you've certainly figured this out, but if you don't want to use the JNI or JNA to do this you can do it pretty simply through .exec calls on your system.

Here is a class that does exactly what you want without having to fuss with JNI or JNA:

public class MegaSimplePdfGenerator {

    public void makeAPdf() throws InterruptedException, IOException {
        Process wkhtml; // Create uninitialized process
        String command = "wkhtmltopdf http://www.google.com /Users/Shared/output.pdf"; // Desired command

        wkhtml = Runtime.getRuntime().exec(command); // Start process
        IOUtils.copy(wkhtml.getErrorStream(), System.err); // Print output to console

        wkhtml.waitFor(); // Allow process to run
    }
}

You MUST to somehow bind to one of the input streams for the process to run. That can be the inputStream or the errorStream. In this case since I'm just writting to a file I went ahead and just connected the System.err to the errorStream from the wkhtml process.

How to use only streams!

If you want the source HTML to come from a stream and/or the destination PDF to be written to a stream then you would use a '-' for the "URI" instead of a regular string.

Example: wkhtmltopdf - - or wkhtmltopdf /Users/Shared/somefile.html -

You can then capture the input and output streams and write and read as needed.

If you are only connecting to a single stream then you don't need to use threads and you won't get a scenario where the streams are waiting on each other endlessly.

However if you are using a stream for BOTH the HTML source AND the PDF Destination, then you MUST use Threads for the process to ever complete.

NOTE: Remember that the OutputStream must be flushed and closed for wkhtmltopdf to start building the PDF and streaming the results!

Example:

public class StreamBasedPdfGenerator {
  public void makeAPdfWithStreams() throws InterruptedException, IOException {
        Process wkhtml; // Create uninitialized process

        // Start by setting up file streams
        File destinationFile = new File("/Users/Shared/output.pdf");
        File sourceFile = new File("/Users/Shared/pdfPrintExample.html");

        FileInputStream fis = new FileInputStream(sourceFile);
        FileOutputStream fos = new FileOutputStream(destinationFile);

        String command = "wkhtmltopdf - -"; // Desired command

        wkhtml = Runtime.getRuntime().exec(command); // Start process

        Thread errThread = new Thread(() -> {
            try {
                IOUtils.copy(wkhtml.getErrorStream(), System.err);
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });
        Thread htmlReadThread = new Thread(() -> {
            try {
                IOUtils.copy(fis, wkhtml.getOutputStream());
                wkhtml.getOutputStream().flush();
                wkhtml.getOutputStream().close();
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });
        Thread pdfWriteThread = new Thread(() -> {
            try {
                IOUtils.copy(wkhtml.getInputStream(), fos);
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });

        // Do NOT use Run... it should be clear why, you want them to all be going at the same time.
        errThread.start();
        pdfWriteThread.start();
        htmlReadThread.start();

         // Connect HTML Source Stream to wkhtmltopdf
         // Connect PDF Source Stream from wkhtmltopdf to the Destination file steam

        wkhtml.waitFor(); // Allow process to run
    }
}

Streams are great for when you're running this on a web server and want to avoid creating temporary HTML or PDF files, you can simply stream the response back by capturing and writing to the HTTP Response Stream.

I hope this helps somebody!