Python: Inflate and Deflate implementations

Demi picture Demi · Jul 7, 2009 · Viewed 50.8k times · Source

I am interfacing with a server that requires that data sent to it is compressed with Deflate algorithm (Huffman encoding + LZ77) and also sends data that I need to Inflate.

I know that Python includes Zlib, and that the C libraries in Zlib support calls to Inflate and Deflate, but these apparently are not provided by the Python Zlib module. It does provide Compress and Decompress, but when I make a call such as the following:

result_data = zlib.decompress( base64_decoded_compressed_string )

I receive the following error:

Error -3 while decompressing data: incorrect header check

Gzip does no better; when making a call such as:

result_data = gzip.GzipFile( fileobj = StringIO.StringIO( base64_decoded_compressed_string ) ).read()

I receive the error:

IOError: Not a gzipped file

which makes sense as the data is a Deflated file not a true Gzipped file.

Now I know that there is a Deflate implementation available (Pyflate), but I do not know of an Inflate implementation.

It seems that there are a few options:

  1. Find an existing implementation (ideal) of Inflate and Deflate in Python
  2. Write my own Python extension to the zlib c library that includes Inflate and Deflate
  3. Call something else that can be executed from the command line (such as a Ruby script, since Inflate/Deflate calls in zlib are fully wrapped in Ruby)
  4. ?

I am seeking a solution, but lacking a solution I will be thankful for insights, constructive opinions, and ideas.

Additional information: The result of deflating (and encoding) a string should, for the purposes I need, give the same result as the following snippet of C# code, where the input parameter is an array of UTF bytes corresponding to the data to compress:

public static string DeflateAndEncodeBase64(byte[] data)
{
    if (null == data || data.Length < 1) return null;
    string compressedBase64 = "";

    //write into a new memory stream wrapped by a deflate stream
    using (MemoryStream ms = new MemoryStream())
    {
        using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
        {
            //write byte buffer into memorystream
            deflateStream.Write(data, 0, data.Length);
            deflateStream.Close();

            //rewind memory stream and write to base 64 string
            byte[] compressedBytes = new byte[ms.Length];
            ms.Seek(0, SeekOrigin.Begin);
            ms.Read(compressedBytes, 0, (int)ms.Length);
            compressedBase64 = Convert.ToBase64String(compressedBytes);
        }
    }
    return compressedBase64;
}

Running this .NET code for the string "deflate and encode me" gives the result

7b0HYBxJliUmL23Ke39K9UrX4HShCIBgEyTYkEAQ7MGIzeaS7B1pRyMpqyqBymVWZV1mFkDM7Z28995777333nvvvfe6O51OJ/ff/z9cZmQBbPbOStrJniGAqsgfP358Hz8iZvl5mbV5mi1nab6cVrM8XeT/Dw==

When "deflate and encode me" is run through the Python Zlib.compress() and then base64 encoded, the result is "eJxLSU3LSSxJVUjMS1FIzUvOT0lVyE0FAFXHB6k=".

It is clear that zlib.compress() is not an implementation of the same algorithm as the standard Deflate algorithm.

More Information:

The first 2 bytes of the .NET deflate data ("7b0HY..."), after b64 decoding are 0xEDBD, which does not correspond to Gzip data (0x1f8b), BZip2 (0x425A) data, or Zlib (0x789C) data.

The first 2 bytes of the Python compressed data ("eJxLS..."), after b64 decoding are 0x789C. This is a Zlib header.

SOLVED

To handle the raw deflate and inflate, without header and checksum, the following things needed to happen:

On deflate/compress: strip the first two bytes (header) and the last four bytes (checksum).

On inflate/decompress: there is a second argument for window size. If this value is negative it suppresses headers. here are my methods currently, including the base64 encoding/decoding - and working properly:

import zlib
import base64

def decode_base64_and_inflate( b64string ):
    decoded_data = base64.b64decode( b64string )
    return zlib.decompress( decoded_data , -15)

def deflate_and_base64_encode( string_val ):
    zlibbed_str = zlib.compress( string_val )
    compressed_string = zlibbed_str[2:-4]
    return base64.b64encode( compressed_string )

Answer

Markus Jarderot picture Markus Jarderot · Jul 7, 2009

You can still use the zlib module to inflate/deflate data. The gzip module uses it internally, but adds a file-header to make it into a gzip-file. Looking at the gzip.py file, something like this could work:

import zlib

def deflate(data, compresslevel=9):
    compress = zlib.compressobj(
            compresslevel,        # level: 0-9
            zlib.DEFLATED,        # method: must be DEFLATED
            -zlib.MAX_WBITS,      # window size in bits:
                                  #   -15..-8: negate, suppress header
                                  #   8..15: normal
                                  #   16..30: subtract 16, gzip header
            zlib.DEF_MEM_LEVEL,   # mem level: 1..8/9
            0                     # strategy:
                                  #   0 = Z_DEFAULT_STRATEGY
                                  #   1 = Z_FILTERED
                                  #   2 = Z_HUFFMAN_ONLY
                                  #   3 = Z_RLE
                                  #   4 = Z_FIXED
    )
    deflated = compress.compress(data)
    deflated += compress.flush()
    return deflated

def inflate(data):
    decompress = zlib.decompressobj(
            -zlib.MAX_WBITS  # see above
    )
    inflated = decompress.decompress(data)
    inflated += decompress.flush()
    return inflated

I don't know if this corresponds exactly to whatever your server requires, but those two functions are able to round-trip any data I tried.

The parameters maps directly to what is passed to the zlib library functions.

PythonC
zlib.compressobj(...)deflateInit(...)
compressobj.compress(...)deflate(...)
zlib.decompressobj(...)inflateInit(...)
decompressobj.decompress(...)inflate(...)

The constructors create the structure and populate it with default values, and pass it along to the init-functions. The compress/decompress methods update the structure and pass it to inflate/deflate.