What is a good way to test a file to see if its a zip file?

Phil Hannent picture Phil Hannent · Dec 11, 2009 · Viewed 10.2k times · Source

I am looking as a new file format specification and the specification says the file can be either xml based or a zip file containing an xml file and other files.

The file extension is the same in both cases. What ways could I test the file to decide if it needs decompressing or just reading?

Answer

Simon P Stevens picture Simon P Stevens · Dec 11, 2009

The zip file format is defined by PKWARE. You can find their file specification here.

Near the top you will find the header specification:

A. Local file header:

    local file header signature     4 bytes  (0x04034b50)
    version needed to extract       2 bytes
    general purpose bit flag        2 bytes
    compression method              2 bytes
    last mod file time              2 bytes
    last mod file date              2 bytes
    crc-32                          4 bytes
    compressed size                 4 bytes
    uncompressed size               4 bytes
    file name length                2 bytes
    extra field length              2 bytes

    file name (variable size)
    extra field (variable size)

From this you can see that the first 4 bytes of the header should be the file signature which should be the hex value 0x04034b50. Byte order in the file is the other way round - PKWARE specify that "All values are stored in little-endian byte order unless otherwise specified.", so if you use a hex editor to view the file you will see 50 4b 03 04 as the first 4 bytes.

You can use this to check if your file is a zip file. If you open the file in notepad, you will notice that the first two bytes (50 and 4b) are the ASCII characters PK.