I have a pdf that was generated from scanning software. The pdf has 1 TIFF image per page. I want to extract the TIFF image from each page.
I am using iTextSharp and I have successfully found the images and can get back the raw bytes from the PdfReader.GetStreamBytesRaw
method. The problem is, as many before me have discovered, iTextSharp does not contain a PdfReader.CCITTFaxDecode
method.
What else do I know? Even without iTextSharp I can open the pdf in notepad and find the streams with /Filter /CCITTFaxDecode
and I know from the /DecodeParams
that it is using CCITTFaxDecode group 4.
Does anyone out there know how I can get the CCITTFaxDecode filter images out of my pdf?
Cheers, Kahu
Actually, vbcrlfuser's answer did help me, but the code was not quite correct for the current version of BitMiracle.LibTiff.NET, as I could download it. In the current version, equivalent code looks like this:
using iTextSharp.text.pdf;
using BitMiracle.LibTiff.Classic;
...
Tiff tiff = Tiff.Open("C:\\test.tif", "w");
tiff.SetField(TiffTag.IMAGEWIDTH, UInt32.Parse(pd.Get(PdfName.WIDTH).ToString()));
tiff.SetField(TiffTag.IMAGELENGTH, UInt32.Parse(pd.Get(PdfName.HEIGHT).ToString()));
tiff.SetField(TiffTag.COMPRESSION, Compression.CCITTFAX4);
tiff.SetField(TiffTag.BITSPERSAMPLE, UInt32.Parse(pd.Get(PdfName.BITSPERCOMPONENT).ToString()));
tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
tiff.WriteRawStrip(0, raw, raw.Length);
tiff.Close();
Using the above code, I finally got a valid Tiff file in C:\test.tif. Thank you, vbcrlfuser!