I have a gzip file and I am trying to read it via Python as below:
import zlib
do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)
it throws this error:
zlib.error: Error -3 while decompressing: incorrect header check
How can I overcome it?
You have this error:
zlib.error: Error -3 while decompressing: incorrect header check
Which is most likely because you are trying to check headers that are not there, e.g. your data follows RFC 1951
(deflate
compressed format) rather than RFC 1950
(zlib
compressed format) or RFC 1952
(gzip
compressed format).
But zlib
can decompress all those formats:
deflate
format, use wbits = -zlib.MAX_WBITS
zlib
format, use wbits = zlib.MAX_WBITS
gzip
format, use wbits = zlib.MAX_WBITS | 16
See documentation in http://www.zlib.net/manual.html#Advanced (section inflateInit2
)
test data:
>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>>
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>
obvious test for zlib
:
>>> zlib.decompress(zlib_data)
'test'
test for deflate
:
>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'
test for gzip
:
>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'
the data is also compatible with gzip
module:
>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data) # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()
adding 32
to windowBits
will trigger header detection
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'
gzip
insteador you can ignore zlib
and use gzip
module directly; but please remember that under the hood, gzip
uses zlib
.
fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()