Reading multiple Python pickled data at once, buffering and newlines?

san picture san · Apr 1, 2011 · Viewed 7.4k times · Source

to give you context:

I have a large file f, several Gigs in size. It contains consecutive pickles of different object that were generated by running

for obj in objs: cPickle.dump(obj, f)

I want to take advantage of buffering when reading this file. What I want, is to read several picked objects into a buffer at a time. What is the best way of doing this? I want an analogue of readlines(buffsize) for pickled data. In fact if the picked data is indeed newline delimited one could use readlines, but I am not sure if that is true.

Another option that I have in mind is to dumps() the pickled object to a string first and then to write the strings to a file, each separated by a newline. To read the file back I can use readlines() and loads(). But I fear that a pickled object may have the "\n" character and it will throw off this file reading scheme. Is my fear unfounded ?

One option is to pickle it out as a huge list of objects, but that will require more memory than I can afford. The setup can be sped up by multi-threading but I do not want to go there before I get the buffering working properly. Whats the "best practice" for situations like this.

EDIT: I can also read in raw bytes into a buffer and invoke loads on that, but I need to know how many bytes of that buffer was consumed by loads so that I can throw the head away.

Answer

SingleNegationElimination picture SingleNegationElimination · Apr 1, 2011

You don't need to do anything, i think.

>>> import pickle
>>> import StringIO
>>> s = StringIO.StringIO(pickle.dumps('apples') + pickle.dumps('bananas'))
>>> pickle.load(s)
'apples'
>>> pickle.load(s)
'bananas'
>>> pickle.load(s)

Traceback (most recent call last):
  File "<pyshell#25>", line 1, in <module>
    pickle.load(s)
  File "C:\Python26\lib\pickle.py", line 1370, in load
    return Unpickler(file).load()
  File "C:\Python26\lib\pickle.py", line 858, in load
    dispatch[key](self)
  File "C:\Python26\lib\pickle.py", line 880, in load_eof
    raise EOFError
EOFError
>>>