I am interfacing with a server that requires that data sent to it is compressed with Deflate algorithm (Huffman encoding + LZ77) and also sends data that I need to Inflate.
I know that Python includes Zlib, and that the C libraries in Zlib support calls to Inflate and Deflate, but these apparently are not provided by the Python Zlib module. It does provide Compress and Decompress, but when I make a call such as the following:
result_data = zlib.decompress( base64_decoded_compressed_string )
I receive the following error:
Error -3 while decompressing data: incorrect header check
Gzip does no better; when making a call such as:
result_data = gzip.GzipFile( fileobj = StringIO.StringIO( base64_decoded_compressed_string ) ).read()
I receive the error:
IOError: Not a gzipped file
which makes sense as the data is a Deflated file not a true Gzipped file.
Now I know that there is a Deflate implementation available (Pyflate), but I do not know of an Inflate implementation.
It seems that there are a few options:
I am seeking a solution, but lacking a solution I will be thankful for insights, constructive opinions, and ideas.
Additional information: The result of deflating (and encoding) a string should, for the purposes I need, give the same result as the following snippet of C# code, where the input parameter is an array of UTF bytes corresponding to the data to compress:
public static string DeflateAndEncodeBase64(byte[] data)
{
if (null == data || data.Length < 1) return null;
string compressedBase64 = "";
//write into a new memory stream wrapped by a deflate stream
using (MemoryStream ms = new MemoryStream())
{
using (DeflateStream deflateStream = new DeflateStream(ms, CompressionMode.Compress, true))
{
//write byte buffer into memorystream
deflateStream.Write(data, 0, data.Length);
deflateStream.Close();
//rewind memory stream and write to base 64 string
byte[] compressedBytes = new byte[ms.Length];
ms.Seek(0, SeekOrigin.Begin);
ms.Read(compressedBytes, 0, (int)ms.Length);
compressedBase64 = Convert.ToBase64String(compressedBytes);
}
}
return compressedBase64;
}
Running this .NET code for the string "deflate and encode me" gives the result
7b0HYBxJliUmL23Ke39K9UrX4HShCIBgEyTYkEAQ7MGIzeaS7B1pRyMpqyqBymVWZV1mFkDM7Z28995777333nvvvfe6O51OJ/ff/z9cZmQBbPbOStrJniGAqsgfP358Hz8iZvl5mbV5mi1nab6cVrM8XeT/Dw==
When "deflate and encode me" is run through the Python Zlib.compress() and then base64 encoded, the result is "eJxLSU3LSSxJVUjMS1FIzUvOT0lVyE0FAFXHB6k=".
It is clear that zlib.compress() is not an implementation of the same algorithm as the standard Deflate algorithm.
More Information:
The first 2 bytes of the .NET deflate data ("7b0HY..."), after b64 decoding are 0xEDBD, which does not correspond to Gzip data (0x1f8b), BZip2 (0x425A) data, or Zlib (0x789C) data.
The first 2 bytes of the Python compressed data ("eJxLS..."), after b64 decoding are 0x789C. This is a Zlib header.
SOLVED
To handle the raw deflate and inflate, without header and checksum, the following things needed to happen:
On deflate/compress: strip the first two bytes (header) and the last four bytes (checksum).
On inflate/decompress: there is a second argument for window size. If this value is negative it suppresses headers. here are my methods currently, including the base64 encoding/decoding - and working properly:
import zlib
import base64
def decode_base64_and_inflate( b64string ):
decoded_data = base64.b64decode( b64string )
return zlib.decompress( decoded_data , -15)
def deflate_and_base64_encode( string_val ):
zlibbed_str = zlib.compress( string_val )
compressed_string = zlibbed_str[2:-4]
return base64.b64encode( compressed_string )
You can still use the zlib
module to inflate/deflate data. The gzip
module uses it internally, but adds a file-header to make it into a gzip-file. Looking at the gzip.py
file, something like this could work:
import zlib
def deflate(data, compresslevel=9):
compress = zlib.compressobj(
compresslevel, # level: 0-9
zlib.DEFLATED, # method: must be DEFLATED
-zlib.MAX_WBITS, # window size in bits:
# -15..-8: negate, suppress header
# 8..15: normal
# 16..30: subtract 16, gzip header
zlib.DEF_MEM_LEVEL, # mem level: 1..8/9
0 # strategy:
# 0 = Z_DEFAULT_STRATEGY
# 1 = Z_FILTERED
# 2 = Z_HUFFMAN_ONLY
# 3 = Z_RLE
# 4 = Z_FIXED
)
deflated = compress.compress(data)
deflated += compress.flush()
return deflated
def inflate(data):
decompress = zlib.decompressobj(
-zlib.MAX_WBITS # see above
)
inflated = decompress.decompress(data)
inflated += decompress.flush()
return inflated
I don't know if this corresponds exactly to whatever your server requires, but those two functions are able to round-trip any data I tried.
The parameters maps directly to what is passed to the zlib library functions.
Python ⇒ C
zlib.compressobj(...)
⇒ deflateInit(...)
compressobj.compress(...)
⇒ deflate(...)
zlib.decompressobj(...)
⇒ inflateInit(...)
decompressobj.decompress(...)
⇒ inflate(...)
The constructors create the structure and populate it with default values, and pass it along to the init-functions.
The compress
/decompress
methods update the structure and pass it to inflate
/deflate
.