I'm trying to storage 5000 data elements on an array. This 5000 elements are storage on an existent file (therefore it's not empty).
But I'm getting an error and I don't know what is causing it.
IN:
def array():
name = 'puntos.df4'
m = open(name, 'rb')
v = []*5000
m.seek(-5000, io.SEEK_END)
fp = m.tell()
sz = os.path.getsize(name)
while fp < sz:
pt = pickle.load(m)
v.append(pt)
m.close()
return v
OUT:
line 23, in array
pt = pickle.load(m)
_pickle.UnpicklingError: invalid load key, ''.
pickling is recursive, not sequential. Thus, to pickle a list, pickle
will start to pickle the containing list, then pickle the first element… diving into the first element and pickling dependencies and sub-elements until the first element is serialized. Then moves on to the next element of the list, and so on, until it finally finishes the list and finishes serializing the enclosing list. In short, it's hard to treat a recursive pickle as sequential, except for some special cases. It's better to use a smarter pattern on your dump
, if you want to load
in a special way.
The most common pickle, it to pickle everything with a single dump
to a file -- but then you have to load
everything at once with a single load
. However, if you open a file handle and do multiple dump
calls (e.g. one for each element of the list, or a tuple of selected elements), then your load
will mirror that… you open the file handle and do multiple load
calls until you have all the list elements and can reconstruct the list. It's still not easy to selectively load
only certain list elements, however. To do that, you'd probably have to store your list elements as a dict
(with the index of the element or chunk as the key) using a package like klepto
, which can break up a pickled dict
into several files transparently, and enables easy loading of specific elements.