zlib compressing byte array?

Guapo picture Guapo · Jun 8, 2011 · Viewed 27.2k times · Source

I have this uncompressed byte array:

0E 7C BD 03 6E 65 67 6C 65 63 74 00 00 00 00 00 00 00 00 00 42 52 00 00 01 02 01
00 BB 14 8D 37 0A 00 00 01 00 00 00 00 05 E9 05 E9 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 00 00 81 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 05 00 00 01 00 00 00

And I need to compress it using the deflate algorithm (implemented in zlib), from what I searched the equivalent in C# would be using GZipStream but I can't match the compressed resulted at all.

Here is the compressing code:

public byte[] compress(byte[] input)
{
    using (MemoryStream ms = new MemoryStream())
    {
        using (GZipStream deflateStream = new GZipStream(ms, CompressionMode.Compress))
        {
            deflateStream.Write(input, 0, input.Length);
        }
        return ms.ToArray();
    }
}

Here is the result of the above compressing code:

1F 8B 08 00 00 00 00 00 04 00 ED BD 07 60 1C 49 96 25 26 2F 6D CA 7B 7F 4A F5 4A
D7 E0 74 A1 08 80 60 13 24 D8 90 40 10 EC C1 88 CD E6 92 EC 1D 69 47 23 29 AB 2A
81 CA 65 56 65 5D 66 16 40 CC ED 9D BC F7 DE 7B EF BD F7 DE 7B EF BD F7 BA 3B 9D
4E 27 F7 DF FF 3F 5C 66 64 01 6C F6 CE 4A DA C9 9E 21 80 AA C8 1F 3F 7E 7C 1F 3F
22 7E 93 9F F9 FB 7F ED 65 7E 51 E6 D3 F6 D7 30 CF 93 57 BF C6 AF F1 6B FE 5A BF
E6 AF F1 F7 FE 56 7F FC 03 F3 D9 AF FB 5F DB AF 83 E7 0F FE 35 23 1F FE BA F4 FE
AF F1 6B FC 1A FF 0F 26 EC 38 82 5C 00 00 00

Here is the result I am expecting:

78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03 4E 41 0C 0C 8C 4C 8C 0C BB 45 7A
CD B9 80 4C 90 18 EB 4B D6 97 0C 28 00 2C CC D0 C8 C8 80 09 58 21 B2 00 65 6B 08
C8

What I am doing wrong, could some one help me out there ?

Answer

Cheeso picture Cheeso · Jun 8, 2011

First, some information: DEFLATE is the compression algorithm, it is defined in RFC 1951. DEFLATE is used in the ZLIB and GZIP formats, defined in RFC 1950 and 1952 respectively, which essentially are thin wrappers around DEFLATE bytestreams. The wrappers provide metadata such as, the name of the file, timestamps, CRCs or Adlers, and so on.

.NET's base class library implements a DeflateStream that produces a raw DEFLATE bytestream, when used for compression. When used in decompression it consumes a raw DEFLATE bytestream. .NET also provides a GZipStream, which is just a GZIP wrapper around that base. There is no ZlibStream in the .NET base class library - nothing that produces or consumes ZLIB. There are some tricks to doing it, you can search around.

The deflate logic in .NET exhibits a behavioral anomaly, where previously compressed data can actually be inflated, significantly, when "compressed". This was the source of a Connect bug raised with Microsoft, and has been discussed here on SO. This may be what you are seeing, as far as ineffective compression. Microsoft have rejected the bug, because while it is ineffective for saving space, the compressed stream is not invalid, in other words it can be "decompressed" by any compliant DEFLATE engine.

In any case, as someone else posted, the compressed bytestream produced by different compressors may not necessarily be the same. It depends on their default settings, and the application-specified settings for the compressor. Even though the compressed bytestreams are different, they may still decompress to the same original bytestream. On the other hand the thing you used to compress was GZIP, while it appears what you want is ZLIB. While they are related, they are not the same; you cannot use GZipStream to produce a ZLIB bytestream. This is the primary source of the difference you see.


I think you want a ZLIB stream.

The free managed Zlib in the DotNetZip project implements compressing streams for all of the three formats (DEFLATE, ZLIB, GZIP). The DeflateStream and GZipStream work the same way as the .NET builtin classes, and there's a ZlibStream class in there, that does what you think it does. None of these classes exhibit the behavior anomaly I described above.


In code it looks like this:

    byte[] original = new byte[] {
        0x0E, 0x7C, 0xBD, 0x03, 0x6E, 0x65, 0x67, 0x6C,
        0x65, 0x63, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x42, 0x52, 0x00, 0x00,
        0x01, 0x02, 0x01, 0x00, 0xBB, 0x14, 0x8D, 0x37,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x05, 0xE9, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x81, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00,
        0x01, 0x00, 0x00, 0x00
    };

    var compressed = Ionic.Zlib.ZlibStream.CompressBuffer(original);

The output is like this:

0000    78 DA E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@[email protected]%.
0030    CE                                                  .

To decompress,

    var uncompressed = Ionic.Zlib.ZlibStream.UncompressBuffer(compressed);

You can see the documentation on the static CompressBuffer method.


EDIT

The question is raised, why is DotNetZip producing 78 DA for the first two bytes instead of 78 9C? The difference is immaterial. 78 DA encodes "max compression", while 78 9C encodes "default compression". As you can see in the data, for this small sample, the actual compressed bytes are exactly the same whether using BEST or DEFAULT. Also, the compression level information is not used during decompression. It has no effect in your application.

If you don't want "max" compression, in other words if you are very set on getting 78 9C as the first two bytes, even though it doesn't matter, then you cannot use the CompressBuffer convenience function, which uses the best compression level under the covers. Instead you can do this:

  var compress = new Func<byte[], byte[]>( a => {
        using (var ms = new System.IO.MemoryStream())
        {
            using (var compressor =
                   new Ionic.Zlib.ZlibStream( ms, 
                                              CompressionMode.Compress,
                                              CompressionLevel.Default )) 
            {
                compressor.Write(a,0,a.Length);
            }

            return ms.ToArray();
        }
    });

  var original = new byte[] { .... };
  var compressed = compress(original);

The result is:

0000    78 9C E3 AB D9 CB 9C 97 9A 9E 93 9A 5C C2 00 03     x...........\...
0010    4E 41 0C 0C 8C 4C 8C 0C BB 45 7A CD 61 62 AC 2F     NA...L...Ez.ab./
0020    19 B0 82 46 46 2C 82 AC 40 FD 40 0A 00 35 25 07     ...FF,..@[email protected]%.
0030    CE                                                  .