to give you context:
I have a large file f
, several Gigs in size. It contains consecutive pickles of different object that were generated by running
for obj in objs: cPickle.dump(obj, f)
I want to take advantage of buffering when reading this file. What I want, is to read several picked objects into a buffer at a time. What is the best way of doing this? I want an analogue of readlines(buffsize)
for pickled data. In fact if the picked data is indeed newline delimited one could use readlines, but I am not sure if that is true.
Another option that I have in mind is to dumps()
the pickled object to a string first and then to write the strings to a file, each separated by a newline. To read the file back I can use readlines()
and loads()
. But I fear that a pickled object may have the "\n"
character and it will throw off this file reading scheme. Is my fear unfounded ?
One option is to pickle it out as a huge list of objects, but that will require more memory than I can afford. The setup can be sped up by multi-threading but I do not want to go there before I get the buffering working properly. Whats the "best practice" for situations like this.
EDIT: I can also read in raw bytes into a buffer and invoke loads on that, but I need to know how many bytes of that buffer was consumed by loads so that I can throw the head away.
You don't need to do anything, i think.
>>> import pickle
>>> import StringIO
>>> s = StringIO.StringIO(pickle.dumps('apples') + pickle.dumps('bananas'))
>>> pickle.load(s)
'apples'
>>> pickle.load(s)
'bananas'
>>> pickle.load(s)
Traceback (most recent call last):
File "<pyshell#25>", line 1, in <module>
pickle.load(s)
File "C:\Python26\lib\pickle.py", line 1370, in load
return Unpickler(file).load()
File "C:\Python26\lib\pickle.py", line 858, in load
dispatch[key](self)
File "C:\Python26\lib\pickle.py", line 880, in load_eof
raise EOFError
EOFError
>>>