How to check if a PDF has any kind of digital signature

Sampisa picture Sampisa · Dec 23, 2016 · Viewed 15.1k times · Source

I need to understand if a PDF has any kind of digital signature. I have to manage huge PDFs, e.g. 500MB each, so I just need to find a way to separate non-signed from signed (so I can send just signed PDFs to a method that manages them). Any procedure found until now involves attempt to extract certificate via e.g. Bouncycastle libs (in my case, for Java): if it is present, pdf is signed, if it not present or a exception is raised, is it not (sic!). But this is obviously time/memory consuming, other than an example of resource-wastings implementation.

Is there any quick language-independent way, e.g. opening PDF file, and reading first bytes and finding an info telling that file is signed? Alternatively, is there any reference manual telling in detail how is made internally a PDF?

Thank you in advance

Answer

Patrick Gallot picture Patrick Gallot · Dec 27, 2016

You are going to want to use a PDF Library rather than trying to implement this all yourself, otherwise you will get bogged down with handling the variations of Linearized documents, Filters, Incremental updates, object streams, cross-reference streams, and more.

With regards to reference material; per my cursory search, it looks like Adobe is no longer providing its version of the ISO 32000:2008 specification to any and all, though that specification is mainly a translation of the PDF v1.7 Reference manual to ISO-conforming language.

So assuming the PDF v1.7 Reference, the most relevant sections are going to be 8.7 (Digital Signatures), 3.6.1 (Document Catalog), and 8.6 (Interactive Forms).

The basic process is going to be:

  1. Read the Document Catalog for 'Perms' and 'AcroForm' entries.
  2. Read the 'Perms' dictionary for 'DocMDP','UR', or 'UR3' entries. If these entries exist, In all likelyhood, you have either a certified document or a Reader-enabled document.
  3. Read the 'AcroForm' entry; (make sure that you do not have an 'XFA' entry, because in the words of Fraizer from Porgy and Bess: Dat's a complication!). You basically want to first check if there is an (optional) 'SigFlags' entry, in which case a non-zero value would indicate that there is a signature in the Fields Array. Otherwise, you need to walk each entry of the 'Fields' Array looking for a field dictionary with an 'FT' (Field Type) entry set to 'Sig' (signature), with a 'V' (Value) entry that is not null.

Using a PDF library that can use the document's cross-reference table to navigate you to the right indirect objects should be faster and less resource-intensive than a brute-force search of the document for a certificate.