Extracting image from PDF with /CCITTFaxDecode filter

Kahu picture Kahu · Apr 15, 2010 · Viewed 16.9k times · Source

I have a pdf that was generated from scanning software. The pdf has 1 TIFF image per page. I want to extract the TIFF image from each page.

I am using iTextSharp and I have successfully found the images and can get back the raw bytes from the PdfReader.GetStreamBytesRaw method. The problem is, as many before me have discovered, iTextSharp does not contain a PdfReader.CCITTFaxDecode method.

What else do I know? Even without iTextSharp I can open the pdf in notepad and find the streams with /Filter /CCITTFaxDecode and I know from the /DecodeParams that it is using CCITTFaxDecode group 4.

Does anyone out there know how I can get the CCITTFaxDecode filter images out of my pdf?

Cheers, Kahu

Answer

Berend Engelbrecht picture Berend Engelbrecht · Aug 29, 2010

Actually, vbcrlfuser's answer did help me, but the code was not quite correct for the current version of BitMiracle.LibTiff.NET, as I could download it. In the current version, equivalent code looks like this:

using iTextSharp.text.pdf;
using BitMiracle.LibTiff.Classic;

...
      Tiff tiff = Tiff.Open("C:\\test.tif", "w");
      tiff.SetField(TiffTag.IMAGEWIDTH, UInt32.Parse(pd.Get(PdfName.WIDTH).ToString()));
      tiff.SetField(TiffTag.IMAGELENGTH, UInt32.Parse(pd.Get(PdfName.HEIGHT).ToString()));
      tiff.SetField(TiffTag.COMPRESSION, Compression.CCITTFAX4);
      tiff.SetField(TiffTag.BITSPERSAMPLE, UInt32.Parse(pd.Get(PdfName.BITSPERCOMPONENT).ToString()));
      tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
      tiff.WriteRawStrip(0, raw, raw.Length);
      tiff.Close();

Using the above code, I finally got a valid Tiff file in C:\test.tif. Thank you, vbcrlfuser!