How to extract FlateDecoded Images from PDF with PDFSharp

der_chirurg picture der_chirurg · Apr 5, 2012 · Viewed 7.9k times · Source

how do I extract Images, which are FlateDecoded (such like PNG) out of a PDF-Document with PDFSharp?

I found that comment in a Sample of PDFSharp:

// TODO: You can put the code here that converts vom PDF internal image format to a
// Windows bitmap
// and use GDI+ to save it in PNG format.
// [...]
// Take a look at the file
// PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.

Does anyone have a solution for this problem?

Thanks for your replies.

EDIT: Because I'm not able to answer on my own Question within 8 hours, I do it on that way:

Thanks for your very fast reply.

I added some Code to the Method "ExportAsPngImage", but I didn't get the wanted results. It is just extracting a few more Images (png) and they don't have the right colors and are distorted.

Here's my actual Code:

PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
        byte[] decodedBytes = flate.Decode(bytes);

        System.Drawing.Imaging.PixelFormat pixelFormat;

        switch (bitsPerComponent)
        {
            case 1:
                pixelFormat = PixelFormat.Format1bppIndexed;
                break;
            case 8:
                pixelFormat = PixelFormat.Format8bppIndexed;
                break;
            case 24:
                pixelFormat = PixelFormat.Format24bppRgb;
                break;
            default:
                throw new Exception("Unknown pixel format " + bitsPerComponent);
        }

        Bitmap bmp = new Bitmap(width, height, pixelFormat);
        var bmpData = bmp.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, pixelFormat);
        int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);
        for (int i = 0; i < height; i++)
        {
            int offset = i * length;
            int scanOffset = i * bmpData.Stride;
            Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
        }
        bmp.UnlockBits(bmpData);
        using (FileStream fs = new FileStream(@"C:\Export\PdfSharp\" + String.Format("Image{0}.png", count), FileMode.Create, FileAccess.Write))
        {
            bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
        }

Is that the right way? Or should I choose another way? Thanks a lot!

Answer

New Tartarus picture New Tartarus · Aug 1, 2019

I know this answer might be a few years to late, but maybe it will help others.

The disortion occurs in my case because image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent) seems to not return the correct value. As Vive la déraison pointed out under your question, you get the BGR Format for using Marshal.Copy. So reversing the Bytes and rotating the Bitmap after executing Marshal.Copy will do the job.

The resulting code looks like this:

private static void ExportAsPngImage(PdfDictionary image, ref int count)
    {
        int width = image.Elements.GetInteger(PdfImage.Keys.Width);
        int height = image.Elements.GetInteger(PdfImage.Keys.Height);

        var canUnfilter = image.Stream.TryUnfilter();
        byte[] decodedBytes;

        if (canUnfilter)
        {
            decodedBytes = image.Stream.Value;
        }
        else
        {
            PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
            decodedBytes = flate.Decode(image.Stream.Value);
        }

        int bitsPerComponent = 0;
        while (decodedBytes.Length - ((width * height) * bitsPerComponent / 8) != 0)
        {
            bitsPerComponent++;
        }

        System.Drawing.Imaging.PixelFormat pixelFormat;
        switch (bitsPerComponent)
        {
            case 1:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format1bppIndexed;
                break;
            case 8:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format8bppIndexed;
                break;
            case 16:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format16bppArgb1555;
                break;
            case 24:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format24bppRgb;
                break;
            case 32:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format32bppArgb;
                break;
            case 64:
                pixelFormat = System.Drawing.Imaging.PixelFormat.Format64bppArgb;
                break;
            default:
                throw new Exception("Unknown pixel format " + bitsPerComponent);
        }

        decodedBytes = decodedBytes.Reverse().ToArray();

        Bitmap bmp = new Bitmap(width, height, pixelFormat);
        BitmapData bmpData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.WriteOnly, bmp.PixelFormat);
        int length = (int)Math.Ceiling(width * (bitsPerComponent / 8.0));
        for (int i = 0; i < height; i++)
        {
            int offset = i * length;
            int scanOffset = i * bmpData.Stride;
            Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
        }
        bmp.UnlockBits(bmpData);
        bmp.RotateFlip(RotateFlipType.Rotate180FlipNone);
        bmp.Save(String.Format("exported_Images\\Image{0}.png", count++), System.Drawing.Imaging.ImageFormat.Png);
    }

The code might need some optimisation, but it did export FlateDecoded Images correctly in my case.