I have been searching a lot on Google about how to compress existing pdf
(size).
My problem is
I can't use any application, because it needs to be done by a C# program.
I can't use any paid library as my clients don't want to go out of Budget. So a PAID library is certainly a NO
I did my home-work for last 2 days and came upon a solution using iTextSharp, BitMiracle but to no avail as the former decrease just 1% of a file and later one is a paid.
I also came across PDFcompressNET and pdftk but i wasn't able to find their .dll.
Actually the pdf is insurance policy with 2-3 images (black and white) and around 70 pages accounting to size of 5 MB.
I need the output in pdf only(can't be in any other format)
Here's an approach to do this (and this should work without regard to the toolkit you use):
If you have a 24-bit rgb or 32 bit cmyk image do the following:
That said, if you do can do all of this well in an unsupervised manner, you have a commercial product in its own right.
I will say that you can do most of this with Atalasoft dotImage (disclaimers: it's not free; I work there; I've written nearly all the PDF tools; I used to work on Acrobat).
One particular way to that with dotImage is to pull out all the pages that are image only, recompress them and save them out to a new PDF then build a new PDF by taking all the pages from the original document and replacing them the recompressed pages, then saving again. It's not that hard.
List<int> pagesToReplace = new List<int>();
PdfImageCollection pagesToEncode = new PdfImageCollection();
using (Document doc = new Document(sourceStream, password)) {
for (int i=0; i < doc.Pages.Count; i++) {
Page page = doc.Pages[i];
if (page.SingleImageOnly) {
pagesToReplace.Add(i);
// a PDF image encapsulates an image an compression parameters
PdfImage image = ProcessImage(sourceStream, doc, page, i);
pagesToEncode.Add(i);
}
}
PdfEncoder encoder = new PdfEncoder();
encoder.Save(tempOutStream, pagesToEncode, null); // re-encoded pages
tempOutStream.Seek(0, SeekOrigin.Begin);
sourceStream.Seek(0, SeekOrigin.Begin);
PdfDocument finalDoc = new PdfDocument(sourceStream, password);
PdfDocument replacementPages = new PdfDocument(tempOutStream);
for (int i=0; i < pagesToReplace.Count; i++) {
finalDoc.Pages[pagesToReplace[i]] = replacementPages.Pages[i];
}
finalDoc.Save(finalOutputStream);
What's missing here is ProcessImage(). ProcessImage will rasterize the page (and you wouldn't need to understand that the image might have been scaled to be on the PDF) or extract the image (and track the transformation matrix on the image), and go through the steps listed above. This is non-trivial, but it's doable.