Detect if PDF file is correct (header PDF)

Kiquenet picture Kiquenet · Jun 24, 2010 · Viewed 48.6k times · Source

I have a windows .NET application that manages many PDF Files. Some of the files are corrupt.

2 issues: I'll try to explain in my imperfect English...sorry

1.)

How can I detect if any pdf file is correct ?

I want to read header of PDF and detect if it is correct.

var okPDF = PDFCorrect(@"C:\temp\pdfile1.pdf");

2.)

How to know if byte[] (bytearray) of file is PDF file or not.

For example, for ZIP files, you could examine the first four bytes and see if they match the local header signature, i.e. in hex

50 4b 03 04

if (buffer[0] == 0x50 && buffer[1] == 0x4b && buffer[2] == 0x03 && buffer[3] == 0x04)

If you are loading it into a long, this is (0x04034b50). by David Pierson

I want the same for PDF files.

byte[] dataPDF = ...

var okPDF = PDFCorrect(dataPDF);

Any sample source code in .NET?

Answer

Kiquenet picture Kiquenet · Jul 15, 2010

I check Header PDF like this:

 public bool IsPDFHeader(string fileName)
    {
        byte[] buffer = null;
        FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
        BinaryReader br = new BinaryReader(fs);
        long numBytes = new FileInfo(fileName).Length;
        //buffer = br.ReadBytes((int)numBytes);
        buffer = br.ReadBytes(5);

        var enc = new ASCIIEncoding();
        var header = enc.GetString(buffer);

        //%PDF−1.0
        // If you are loading it into a long, this is (0x04034b50).
        if (buffer[0] == 0x25 && buffer[1] == 0x50
            && buffer[2] == 0x44 && buffer[3] == 0x46)
        {
            return header.StartsWith("%PDF-");
        }
        return false;

    }