C#.net identify zip file

Sean Dunford picture Sean Dunford · Aug 17, 2012 · Viewed 21.3k times · Source

I am currently using the SharpZip api to handle my zip file entries. It works splendid for zipping and unzipping. Though, I am having trouble identifying if a file is a zip or not. I need to know if there is a way to detect if a file stream can be decompressed. Originally I used

FileStream lFileStreamIn = File.OpenRead(mSourceFile);
lZipFile = new ZipFile(lFileStreamIn);
ZipInputStream lZipStreamTester = new ZipInputStream(lFileStreamIn, mBufferSize);// not working
lZipStreamTester.Read(lBuffer, 0, 0);
if (lZipStreamTester.CanDecompressEntry)
{

The LZipStreamTester becomes null every time and the if statement fails. I tried it with/without a buffer. Can anybody give any insight as to why? I am aware that i can check for file extension. I need something that is more definitive than that. I am also aware that zip has a magic #(PK something), but it isn't a guarantee that it will always be there because it isn't a requirement of the format.

Also i read about .net 4.5 having native zip support so my project may migrate to that instead of sharpzip but I still need didn't see a method/param similar to CanDecompressEntry here: http://msdn.microsoft.com/en-us/library/3z72378a%28v=vs.110%29

My last resort will be to use a try catch and attempt an unzip on the file.

Answer

dkackman picture dkackman · Aug 17, 2012

This is a base class for a component that needs to handle data that is either uncompressed, PKZIP compressed (sharpziplib) or GZip compressed (built in .net). Perhaps a bit more than you need but should get you going. This is an example of using @PhonicUK's suggestion to parse the header of the data stream. The derived classes you see in the little factory method handled the specifics of PKZip and GZip decompression.

abstract class Expander
{
    private const int ZIP_LEAD_BYTES = 0x04034b50;
    private const ushort GZIP_LEAD_BYTES = 0x8b1f;

    public abstract MemoryStream Expand(Stream stream); 
    
    internal static bool IsPkZipCompressedData(byte[] data)
    {
        Debug.Assert(data != null && data.Length >= 4);
        // if the first 4 bytes of the array are the ZIP signature then it is compressed data
        return (BitConverter.ToInt32(data, 0) == ZIP_LEAD_BYTES);
    }

    internal static bool IsGZipCompressedData(byte[] data)
    {
        Debug.Assert(data != null && data.Length >= 2);
        // if the first 2 bytes of the array are theG ZIP signature then it is compressed data;
        return (BitConverter.ToUInt16(data, 0) == GZIP_LEAD_BYTES);
    }

    public static bool IsCompressedData(byte[] data)
    {
        return IsPkZipCompressedData(data) || IsGZipCompressedData(data);
    }

    public static Expander GetExpander(Stream stream)
    {
        Debug.Assert(stream != null);
        Debug.Assert(stream.CanSeek);
        stream.Seek(0, 0);

        try
        {
            byte[] bytes = new byte[4];

            stream.Read(bytes, 0, 4);

            if (IsGZipCompressedData(bytes))
                return new GZipExpander();

            if (IsPkZipCompressedData(bytes))
                return new ZipExpander();

            return new NullExpander();
        }
        finally
        {
            stream.Seek(0, 0);  // set the stream back to the begining
        }
    }
}