Python Gzip - Appending to file on the fly

general exception picture general exception · Aug 7, 2013 · Viewed 8.7k times · Source

Is it possible to append to a gzipped text file on the fly using Python ?

Basically I am doing this:-

import gzip
content = "Lots of content here"
f = gzip.open('file.txt.gz', 'a', 9)
f.write(content)
f.close()

A line is appended (note "appended") to the file every 6 seconds or so, but the resulting file is just as big as a standard uncompressed file (roughly 1MB when done).

Explicitly specifying the compression level does not seem to make a difference either.

If I gzip an existing uncompressed file afterwards, it's size comes down to roughly 80kb.

Im guessing its not possible to "append" to a gzip file on the fly and have it compress ?

Is this a case of writing to a String.IO buffer and then flushing to a gzip file when done ?

Answer

Mark Adler picture Mark Adler · Aug 7, 2013

That works in the sense of creating and maintaining a valid gzip file, since the gzip format permits concatenated gzip streams.

However it doesn't work in the sense that you get lousy compression, since you are giving each instance of gzip compression so little data to work with. Compression depends on taking advantage the history of previous data, but here gzip has been given essentially none.

You could either a) accumulate at least a few K of data, many of your lines, before invoking gzip to add another gzip stream to the file, or b) do something much more sophisticated that appends to a single gzip stream, leaving a valid gzip stream each time and permitting efficient compression of the data.

You find an example of b) in C, in gzlog.h and gzlog.c. I do not believe that Python has all of the interfaces to zlib needed to implement gzlog directly in Python, but you could interface to the C code from Python.