How to detect type of compression used on the file? (if no file extension is specified)

22332112 picture 22332112 · Oct 1, 2013 · Viewed 60.6k times · Source

How can one detect the type of compression used on the file? (assuming that .zip, .gz, .xz or any other extension is not specified).

Is this information stored somewhere in the header of that file?

Answer

Mark Adler picture Mark Adler · Oct 2, 2013

You can determine that it is likely to be one of those formats by looking at the first few bytes. You should then test to see if it really is one of those, using an integrity check from the associated utility for that format, or by actually proceeding to decompress.

You can find the header formats in the descriptions:

Others:

  • zlib (.zz) format description, starts with two bytes (in bits) 0aaa1000 bbbccccc, where ccccc is chosen so that the first byte viewed as a int16 times 256 plus the second byte viewed as a int16 is a multiple of 31. e.g: 01111000(bits) = 120(int16), 10011100(bits) = 156(int16), 120 * 256 + 156 = 30876 which is a multiple of 31
  • compress (.Z) starts with 0x1f, 0x9d
  • bzip2 (.bz2) starts with 0x42, 0x5a, 0x68