I want to convert an HTML file into a PDF file using wkhtmltopdf
. wkhtmltopdf
is the best option for me as it renders the HTML file using WebKit. The problem is that I want to do the same using Java but wkhtmltopdf
does not provide any Java API.
I can use Runtime.exec()
or ProcessBuilder
to fork a new process from Java and create the PDF output using wkhtmtopdf
in that process. But, as I am developing a web based application, I am not allowed to create so many new processes in the server.
Is there any other way so that I can use wkhtmltopdf
? I really want to use it as it's giving me the exact output.
Or, is there any other open source browser engine that provides a Java API that can render my HTML page just like wkhtmltopdf
?
Remember that the system running your Java Code must have wkhtmltopdf installed for anything I'm saying here to work... go to www.wkhtmltopdf.org and download the version you need.
I know this is old and by now you've certainly figured this out, but if you don't want to use the JNI or JNA to do this you can do it pretty simply through .exec calls on your system.
Here is a class that does exactly what you want without having to fuss with JNI or JNA:
public class MegaSimplePdfGenerator {
public void makeAPdf() throws InterruptedException, IOException {
Process wkhtml; // Create uninitialized process
String command = "wkhtmltopdf http://www.google.com /Users/Shared/output.pdf"; // Desired command
wkhtml = Runtime.getRuntime().exec(command); // Start process
IOUtils.copy(wkhtml.getErrorStream(), System.err); // Print output to console
wkhtml.waitFor(); // Allow process to run
}
}
You MUST to somehow bind to one of the input streams for the process to run. That can be the inputStream or the errorStream. In this case since I'm just writting to a file I went ahead and just connected the System.err to the errorStream from the wkhtml process.
How to use only streams!
If you want the source HTML to come from a stream and/or the destination PDF to be written to a stream then you would use a '-' for the "URI" instead of a regular string.
Example: wkhtmltopdf - -
or wkhtmltopdf /Users/Shared/somefile.html -
You can then capture the input and output streams and write and read as needed.
If you are only connecting to a single stream then you don't need to use threads and you won't get a scenario where the streams are waiting on each other endlessly.
However if you are using a stream for BOTH the HTML source AND the PDF Destination, then you MUST use Threads for the process to ever complete.
NOTE: Remember that the OutputStream must be flushed and closed for wkhtmltopdf to start building the PDF and streaming the results!
Example:
public class StreamBasedPdfGenerator {
public void makeAPdfWithStreams() throws InterruptedException, IOException {
Process wkhtml; // Create uninitialized process
// Start by setting up file streams
File destinationFile = new File("/Users/Shared/output.pdf");
File sourceFile = new File("/Users/Shared/pdfPrintExample.html");
FileInputStream fis = new FileInputStream(sourceFile);
FileOutputStream fos = new FileOutputStream(destinationFile);
String command = "wkhtmltopdf - -"; // Desired command
wkhtml = Runtime.getRuntime().exec(command); // Start process
Thread errThread = new Thread(() -> {
try {
IOUtils.copy(wkhtml.getErrorStream(), System.err);
} catch (IOException e) {
throw new RuntimeException(e);
}
});
Thread htmlReadThread = new Thread(() -> {
try {
IOUtils.copy(fis, wkhtml.getOutputStream());
wkhtml.getOutputStream().flush();
wkhtml.getOutputStream().close();
} catch (IOException e) {
throw new RuntimeException(e);
}
});
Thread pdfWriteThread = new Thread(() -> {
try {
IOUtils.copy(wkhtml.getInputStream(), fos);
} catch (IOException e) {
throw new RuntimeException(e);
}
});
// Do NOT use Run... it should be clear why, you want them to all be going at the same time.
errThread.start();
pdfWriteThread.start();
htmlReadThread.start();
// Connect HTML Source Stream to wkhtmltopdf
// Connect PDF Source Stream from wkhtmltopdf to the Destination file steam
wkhtml.waitFor(); // Allow process to run
}
}
Streams are great for when you're running this on a web server and want to avoid creating temporary HTML or PDF files, you can simply stream the response back by capturing and writing to the HTTP Response Stream.
I hope this helps somebody!