more efficient way to pickle a string

python numpy pickle space-efficiency

gatoatigrado · Mar 30, 2009 · Viewed 9.8k times · Source

The pickle module seems to use string escape characters when pickling; this becomes inefficient e.g. on numpy arrays. Consider the following

z = numpy.zeros(1000, numpy.uint8)
len(z.dumps())
len(cPickle.dumps(z.dumps()))

The lengths are 1133 characters and 4249 characters respectively.

z.dumps() reveals something like "\x00\x00" (actual zeros in string), but pickle seems to be using the string's repr() function, yielding "'\x00\x00'" (zeros being ascii zeros).

i.e. ("0" in z.dumps() == False) and ("0" in cPickle.dumps(z.dumps()) == True)

Answer

Try using a later version of the pickle protocol with the protocol parameter to pickle.dumps(). The default is 0 and is an ASCII text format. Ones greater than 1 (I suggest you use pickle.HIGHEST_PROTOCOL). Protocol formats 1 and 2 (and 3 but that's for py3k) are binary and should be more space conservative.

more efficient way to pickle a string

Answer

Related questions