I am running code that creates large objects, containing multiple user-defined classes, which I must then serialize for later use. From what I can tell, only pickling is versatile enough for my requirements. I've been using cPickle to store them but the objects it generates are approximately 40G in size, from code that runs in 500 mb of memory. Speed of serialization isn't an issue, but size of the object is. Are there any tips or alternate processes I can use to make the pickles smaller?
You can combine your cPickle dump
call with a zipfile:
import cPickle
import gzip
def save_zipped_pickle(obj, filename, protocol=-1):
with gzip.open(filename, 'wb') as f:
cPickle.dump(obj, f, protocol)
And to re-load a zipped pickled object:
def load_zipped_pickle(filename):
with gzip.open(filename, 'rb') as f:
loaded_object = cPickle.load(f)
return loaded_object