Pickle with custom classes

Joe picture Joe · Jun 1, 2012 · Viewed 39.9k times · Source

Suppose I have a simple python class definition in a file myClass.py

class Test:
    A = []

And I also have two test scripts. The first script creates an object of type Test, populates the array A, and pickles the result to a file. It immediately unpickles it from the file and the array is still populated. The second script just unpickles from the file, and the array is not populated (i.e. A == []). Why is this?

test1.py

import myClass
import pickle

x = myClass.Test()

for i in xrange(5):
    x.A.append(i)

f = open('data', 'w')
pickle.dump(x,f)
f.close()

f = open('data')
y = pickle.load(f)
f.close

print y.A

and test2.py

import myClass
import pickle

f = open('data')
y = pickle.load(f)
f.close

print y.A

Answer

jdi picture jdi · Jun 1, 2012

It is because you are setting Test.A as a class attribute instead of an instance attribute. Really what is happening is that with the test1.py, the object being read back from the pickle file is the same as test2.py, but its using the class in memory where you had originally assigned x.A.

When your data is being unpickled from the file, it creates a new instance of the class type, and then applies whatever instance data it needs to. But your only data was a class attribute. Its always referring back to the class thats in memory, which you modified in one, but not in another file.

Compare the differences in this example:

class Test:
    A = []  # a class attribute
    def __init__(self):
        self.a = []  # an instance attribute

You will notice that the instance attribute a will be pickled and unpickled properly, while the class attribute A will simply refer to the class in memory.

for i in xrange(5):
    x.A.append(i)
    x.a.append(i)  

with open('data', 'w') as f:
    pickle.dump(x,f)

with open('data') as f:
    y = pickle.load(f)

>>> y.A
[0, 1, 2, 3, 4]
>>> y.a
[0, 1, 2, 3, 4]
>>> Test.A
[0, 1, 2, 3, 4]
>>> Test.A = []  # resetting the class attribute
>>> y.a 
[0, 1, 2, 3, 4]
>>> y.A  # refers to the class attribute
[]