Does iTextSharp Handle PDF Compression?

J-man picture J-man · May 19, 2016 · Viewed 14.3k times · Source

Can iTextSharp compress PDF files? I am looking for a PDF library that can be used in development to compress PDF files. Essentially, I have a list of folders that contain many PDF files ranging from 1MB to 10MB in size, and the quantity of these folders keeps growing every day, so to save disk space I would like to be able to read in a PDF file once it has been processed, compress it, then save it to the designated folder location.

If iTextSharp does not support compression, does anyone have suggestions for other .NET PDF libraries that could? Purchasing a library wouldn't be a problem. I looked at many of the free ones, such as PDFSharp, which is very good in my opinion at making PDFs, but cannot render or compress them.

There is a great answer I read on stackoverflow from Chris Haas:

PdfStamper is a helper class that ultimately uses another class called PdfStamperImp to do most of the work. PdfStamperImp is derived from PdfWriter and when you use stamper.Writer you are actually getting back this implementation class. Many of the properties on PdfStamper also pass directly through to the implementation class. So these two calls actually do the same thing.

stamper.SetFullCompression();
stamper.Writer.SetFullCompression();

Another point of confusion is that SetFullCompression and the CompressionLevel aren't actually related at all. "Full compression" represents a feature that was added in PDF 1.5 called "Object Streams" that allows grouping PDF objects together to potentially allow for greater compression. There's actually no requirement that what we think of as "compression" actually occurs but in reality I think it would always happen. (Possibly a super simple document might get larger with this enabled, not sure and don't feel like testing.)

The CompressionLevel is actually what you normally think of as compression, a number from 0 to 9 or -1 to mean default (which currently equals six I think). This property is actually part of the PdfStream class which many classes ultimately derive from. This setting doesn't "trickle down", however. Since you are importing a stream from another location via GetPageContent() and SetPageContent() that specific stream has its own compression settings unrelated to the Writer's compression settings. There's actually a third parameter that you can pass to SetPageContent() to set your specific compression level if you want.

reader.SetPageContent(1, reader.GetPageContent(1), PdfStream.BEST_COMPRESSION);


https://stackoverflow.com/a/22028008/2063134

Any help or suggestions will greatly be appreciated.

Thank you.

Answer

Bruno Lowagie picture Bruno Lowagie · May 19, 2016

Yes, iText and iTextSharp support compression.

  • From PDF 1.0 (1993) to PDF 1.1 (1994), PDF syntax stored in content streams wasn't compressed.
  • From PDF 1.2 (1996) on, PDF syntax stored in content streams could be compressed. The standard filter is /FlateDecode. This algorithm is similar to the ZIP algorithm and you can set different levels of compression (from 0 to 9; where choosing -1 will use whatever your programming language considers being the default).
  • From PDF 1.5 (2003) on, the indirect objects can be stored in a compressed object stream. Additionally, the cross-reference table can be compressed and stored in a stream. Before PDF 1.5, this wasn't possible (viewers that only support PDF 1.4 and earlier can't open "fully compressed" PDFs).

iText supports all of the above and Chris' answer already fully answers your question. Since PDF 1.1 dates from a really long time ago (1994), I wouldn't worry about changing the compression levels of content streams, so you can safely forget about:

reader.SetPageContent(1, reader.GetPageContent(1), PdfStream.BEST_COMPRESSION);

Using this line won't reduce the file size much.

Using "full compression" (which will cause the cross-reference table to be compressed) should have an effect on the file size for PDFs with many indirect objects. A minimal "Hello World" file could increase in file size when you use "full compression".

All of the above won't help you much, because good PDF creators already compress whatever can be compressed. Bad PDF creators however (or people using good PDF creators incorrectly) could contain objects that are redundant. For instance: there are people who don't know how to add a logo as an image to each page in a PDF using iTextSharp. Because of their ignorance, they add the image as many times as there are pages. PDF compression won't help you in this case, but if you pass such a "bad" PDF through iTextSharp's PdfSmartCopy, then PdfSmartCopy will detect the redundant objects and reorganize the file so that objects that are repeated over and over again in the file (for instance: every page refers to a different object with the same image bytes), are reused (for instance: every page refers to the same object with the image bytes).

Depending on the version of iTextSharp you're using reader.RemoveUnusedObjects(); will also help you (recent versions remove unused objects by default).