Processing PDFs to reduce file size / and or complexity

Tyler Eaves picture Tyler Eaves · Dec 31, 2010 · Viewed 7.1k times · Source

I have PDF files I need to prepare for viewing on mobile devices. The worse case would be ~50 pages, with lots full color images and vector art, file size approx. 40MB. This is acceptable for PC viewing on broadband, but not great for mobile viewing due to long download times and very laggy scrolling on mobile (At least on my overclocked Droid). Are there any tools or libraries for processing the files to simply the vector stuff, downsample/recompress the images, that sort of thing?

Output in pdf format is not absolutely essential, but it needs to be something readable on android and iOS devices without software downloads.

Answer

Kurt Pfeifle picture Kurt Pfeifle · Jan 2, 2011

There are a few main things that can blow up the size of a PDF on mobile devices:

  • hi-resolution pictures (where lo-res would suffice)
  • embedded fonts (where content would still be readable "good enough" without them)
  • PDF content not required any more for the current version/view (older version of certain objects)
  • embedded ICC profiles
  • embedded third-party files (using the PDF as a container)
  • embedded job tickets (for printing)
  • embedded Javascript
  • and a few more

FOSS software: Ghostscript can try to size down your PDFs, mainy be re-sampling the pictures used and by removing older versions ("generations") of PDF objects which were replaced by newer ones:

gswin32c.exe ^
  -o sized-down.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/ebook ^
  -dEmbedAllFonts=false ^
  -c ".setpdfwrite <</AlwaysEmbed [ ]>>" ^
  -f blown-up.pdf

You can add more parameters to above commandline to size down certain PDFs even more (f.e. by setting a lower max resolution, etc.) Here is an example to enforce a downsampling for color and grayscale images to 72dpi:

gswin32c.exe ^
  -o sized-down.pdf ^
  -sDEVICE=pdfwrite ^
  -dPDFSETTINGS=/ebook ^
  -dEmbedAllFonts=false ^
  -dColorImageDownsampleThreshold=1.0 ^
  -dColorImageDownsampleType=/Average ^
  -dColorImageResolution=72 ^
  -dGrayImageDownsampleThreshold=1.0 ^
  -dGrayImageDownsampleType=/Average ^
  -dGrayImageResolution=72 ^
  -c ".setpdfwrite <</AlwaysEmbed [ ]>>" ^
  -f blown-up.pdf

Commercial+closed source software: callas pdfToolbox4 is able to reduce file sizes even more by applying a custom profile to the PDF downsizing process (it can even un-embed fonts and ICC profiles).


Update 2: See also the following (new) question with the answer:

It provides some sample PostScript code which completely removes all (raster) images from the PDF, leaving the rest of the page layout unchanged. This is useful in cases where you do not want the (raster) images, but only the text parts in order to reduce file size.