Edit existing PDF in a browser

javascript pdf html5-canvas pdf.js

neilsimp1 · May 19, 2017 · Viewed 31.1k times · Source

I have a web application that is currently getting a base64 representation of a PDF from the server. I'm able to use Mozilla's pdf.js to display this on a <canvas> and toggle through the pages with a dropdown.

According to everything I've been able to find and Can Mozilla's pdf.js modify PDFs?, it's not possible to edit the PDF with pdf.js.

I've found jsPDF and while I'm able to take the canvas and do a .toDataURL() with it for each page and build a new PDF document with it, but there are two issues:

The newly generated PDF will just be a series of images on each page, so any text in the original PDF will just be an image after I'm done with it.
I generate a new PDF with jsPDF and then send the base64 of it back to pdf.js to display it on the canvas. Something happens between these steps where the images of the pages get scaled incorrectly, so each page takes up about 3/4 of the canvas after each new PDF change. I've been unable to get it to retain the same size/scale.

jsPDF doesn't look like it has a way to load an existing PDF, it only creates new ones. pdfmake and PDFKit also look like they only create new PDF files.

So my question:

Is there anything that will allow for both viewing a pdf (from base64) and for making changes to it? Ideally I'd watch for changes to the canvas, then draw that change onto the pdf page. When done, export that to a base64 string to send back to the server.

Answer

Quick answer - no and it is quite unlikely you will find a cross-browser solution. It is very unlikely that you will find a PDF-perfect solution. Better to think about having the users edit HTML and generate the PDF at the server.

[Edit June 3rd 2020- given this question is from 2017 you may think it is outdated and discount it. Well, as far as I am aware the answer is still relevant and every other week someone passes through and gives it an up-vote. But if you do find a good lib or util on your travels please come back and list it. Thanks.]

Long answer - the PDF format is both brilliant and fiendish at the same time. Brilliant because of its portability, but fiendish because of the internal structure and storage mechanisms. There is no friendly 'DOM' like with HTML. If we were starting out afresh to develop a portable document format it would not be PDF that we would choose. But PDF currently has too much momentum to be thrown away, period.

Younger viewers might be wondering how the hell this manic format got into its market leading position and where it came from. Well, when the founding fathers of PDF were laying down the design, before XML, JSON, HTML and even the Internet, they weren't working with today's document sharing in mind. They were working on a better way to encode printing instructions - the PostScript printer driver concept. These were never expected to be edited before the printer consumed them, and they were worthless for any other purpose. Then someone noticed the you could interpret the PostScript drawing instructions to a screen, and subsequently someone spotted the fantastic potential to employ this as a transportable, cross device display concept. And here we are.

Back to the question - to edit a PDF in any meaningful GUI way, you would need to unpack the PDF and render the components (images, formatted text, pages) to the display device; then allow folks to mess with the layout; then re-pack the PDF. You would have to do this perfectly in line with the PDF standards otherwise you may find the downstream consumers of your edited PDF file crash or are unable to render it. You would have to cater for the various Acrobat standard levels, and the shortcuts and bloats that the editing package (Word, Illustrator, InDesign) vendors chuck into the PDF file; layers, thumbnails, etc.

Then we come to colors. Have a read of the PDF spec and you will see that there are an array of colorspace options that the original PDF producer can decide to use. You would have to interpret these to a reasonable device color on the screen and back, etc.

And then fonts. Fonts might be embedded subset, or not. To keep fidelity with the PDF you will need to realise the glyphs as vector graphics on your drawing surface at the scale defined in the PDF. This mostly means utilising some kind of platform-dependant type library - tricky cross-platform. Plus the fact that you will need to licence the fonts for appropriate use which can be pricey for the fonts most people want to use to look hip and professional.

Given the layering, scaling and rotating features in PDF, you would likely be looking at an html canvas as the drawing surface. Anyone who knows will tell you that in the world of canvas you are pretty much on your own for word-processing type functions.

Not impossible but hard.

Components that render PDF to a display are largely acting as print drivers, slavishly obeying the PDF drawing instructions, and usually generating a raster or sometimes an SVG graphic. This is a one-way street - they read and draw, but there is no sense of 'handles' to the objects drawn. No handles means no manipulation, and these guys certainly have little intention of letting you modify and write back.

You will find many 'save to pdf' products. When client-side they will be leaning toward grabbing a set of pixels and dumping a raster graphic into a file with the thinnest veneer of 'PDF' definition wrapped around it. Where they are server based then they can be quite powerful - there are plenty of tools like Aspose, and ABCPDF that truly offer some PDF wrangling server side - but this is not what you are looking for in your OP.

Summary - very complicated subject. If anything ever emerges as a potential it will likely have many constraints in terms of the PDF features covered and thus restrictions on what it can safely edit.

If you are looking for online editing of documents that are ultimately exported as PDF, then a way forward is to keep an html version of the document source and have the user edit this with TinyMCE, CKEditor, etc, then use one of the server-side tools to take the saved source HTML and render out to PDF. Tools like ABCPDF render HTML faithfully let you add images, headers and footers, page numbers, etc.

This is a pragmatic answer to your (assumed) need, though it still has some trade-offs in terms of the font (licencing) issues, clunkiness of browser-based editors, all-round weirdness of the HTML laid down by some HTML editing components, etc. But it IS viable.

Final thoughts - rethink the scope of what you need. If HTML editing and convert to PDF at server is usable for you it is a well-trodden path and you will find both free and commercial components for client and server to support it.

Edit: If you need to annotate the PDF then things are much easier. On the server, you need to generate images of the pages of the document, send those to the client, display them to the user, let the user mark them up, capture the co-ordinates of the annotations back to the server and use a server-side PDF library to render the annotations into the PDF. It is achievable, though requires various skillsets for server-side PDF to image manipulation and client side presentation and annotation capture.

Edit: Readers may be interested in knowing if the picture I painted above has changed. As of Jan 2019 I stand by what I wrote. Suppliers are coming to the market with better tools and libraries that can do more than previously. However you still need to assess your needs and confirm their restrictions - it is likely that there will be some. No vendor I am aware of yet has a client-side, cross-browser, cross-device, full capability PDF editing lib for any PDF file - there is always some limitation. But I am happy to be corrected.

Edit *existing* PDF in a browser

Answer

Related questions

Edit existing PDF in a browser