I have PDF files I need to prepare for viewing on mobile devices. The worse case would be ~50 pages, with lots full color images and vector art, file size approx. 40MB. This is acceptable for PC viewing on broadband, but not great for mobile viewing due to long download times and very laggy scrolling on mobile (At least on my overclocked Droid). Are there any tools or libraries for processing the files to simply the vector stuff, downsample/recompress the images, that sort of thing?
Output in pdf format is not absolutely essential, but it needs to be something readable on android and iOS devices without software downloads.
There are a few main things that can blow up the size of a PDF on mobile devices:
FOSS software: Ghostscript can try to size down your PDFs, mainy be re-sampling the pictures used and by removing older versions ("generations") of PDF objects which were replaced by newer ones:
gswin32c.exe ^
-o sized-down.pdf ^
-sDEVICE=pdfwrite ^
-dPDFSETTINGS=/ebook ^
-dEmbedAllFonts=false ^
-c ".setpdfwrite <</AlwaysEmbed [ ]>>" ^
-f blown-up.pdf
You can add more parameters to above commandline to size down certain PDFs even more (f.e. by setting a lower max resolution, etc.) Here is an example to enforce a downsampling for color and grayscale images to 72dpi:
gswin32c.exe ^
-o sized-down.pdf ^
-sDEVICE=pdfwrite ^
-dPDFSETTINGS=/ebook ^
-dEmbedAllFonts=false ^
-dColorImageDownsampleThreshold=1.0 ^
-dColorImageDownsampleType=/Average ^
-dColorImageResolution=72 ^
-dGrayImageDownsampleThreshold=1.0 ^
-dGrayImageDownsampleType=/Average ^
-dGrayImageResolution=72 ^
-c ".setpdfwrite <</AlwaysEmbed [ ]>>" ^
-f blown-up.pdf
Commercial+closed source software: callas pdfToolbox4 is able to reduce file sizes even more by applying a custom profile to the PDF downsizing process (it can even un-embed fonts and ICC profiles).
Update 2: See also the following (new) question with the answer:
It provides some sample PostScript code which completely removes all (raster) images from the PDF, leaving the rest of the page layout unchanged. This is useful in cases where you do not want the (raster) images, but only the text parts in order to reduce file size.