Decreasing the size of cPickle objects

ddn picture ddn · Aug 27, 2013 · Viewed 22.9k times · Source

I am running code that creates large objects, containing multiple user-defined classes, which I must then serialize for later use. From what I can tell, only pickling is versatile enough for my requirements. I've been using cPickle to store them but the objects it generates are approximately 40G in size, from code that runs in 500 mb of memory. Speed of serialization isn't an issue, but size of the object is. Are there any tips or alternate processes I can use to make the pickles smaller?

Answer

John Lyon picture John Lyon · Aug 27, 2013

You can combine your cPickle dump call with a zipfile:

import cPickle
import gzip

def save_zipped_pickle(obj, filename, protocol=-1):
    with gzip.open(filename, 'wb') as f:
        cPickle.dump(obj, f, protocol)

And to re-load a zipped pickled object:

def load_zipped_pickle(filename):
    with gzip.open(filename, 'rb') as f:
        loaded_object = cPickle.load(f)
        return loaded_object