Original file bytes from StreamReader, magic number detection

Tom Hunter picture Tom Hunter · Feb 10, 2013 · Viewed 7.4k times · Source

I'm trying to differentiate between "text files" and "binary" files, as I would effectively like to ignore files with "unreadable" contents.

I have a file that I believe is a GZIP archive. I'm tring to ignore this kind of file by detecting the magic numbers / file signature. If I open the file with the Hex editor plugin in Notepad++ I can see the first three hex codes are 1f 8b 08.

However if I read the file using a StreamReader, I'm not sure how to get to the original bytes..

using (var streamReader = new StreamReader(@"C:\file"))
{
    char[] buffer = new char[10];
    streamReader.Read(buffer, 0, 10);
    var s = new String(buffer);

    byte[] bytes = new byte[6];
    System.Buffer.BlockCopy(s.ToCharArray(), 0, bytes, 0, 6);
    var hex = BitConverter.ToString(bytes);

    var otherhex = BitConverter.ToString(System.Text.Encoding.UTF8.GetBytes(s.ToCharArray()));
}

At the end of the using statement I have the following variable values:

hex: "1F-00-FD-FF-08-00"
otherhex: "1F-EF-BF-BD-08-00-EF-BF-BD-EF-BF-BD-0A-51-02-03"

Neither of which start with the hex values shown in Notepad++.

Is it possible to get the original bytes from the result of reading a file via StreamReader?

Answer

Steve picture Steve · Feb 10, 2013

Your code tries to change a binary buffer into a string. Strings are Unicode in NET so two bytes are required. The resulting is a bit unpredictable as you can see.

Just use a BinaryReader and its ReadBytes method

using(FileStream fs = new FileStream(@"C:\file", FileMode.Open, FileAccess.Read))
{
    using (var reader = new BinaryReader(fs, new ASCIIEncoding()))
    {
        byte[] buffer = new byte[10];
        buffer = reader.ReadBytes(10);
        if(buffer[0] == 31 && buffer[1] == 139 && buffer[2] == 8)
            // you have a signature match....
    }
}