How python handles object instantiation in a ' for' loop

Gauthier Boaglio picture Gauthier Boaglio · Oct 11, 2012 · Viewed 7.1k times · Source

I've got a highly complex class :

class C:
    pass

And I've got this test code :

for j in range(10):
    c = C()
    print c

Which gives :

<__main__.C instance at 0x7f7336a6cb00>
<__main__.C instance at 0x7f7336a6cab8>
<__main__.C instance at 0x7f7336a6cb00>
<__main__.C instance at 0x7f7336a6cab8>
<__main__.C instance at 0x7f7336a6cb00>
<__main__.C instance at 0x7f7336a6cab8>
<__main__.C instance at 0x7f7336a6cb00>
<__main__.C instance at 0x7f7336a6cab8>
<__main__.C instance at 0x7f7336a6cb00>
<__main__.C instance at 0x7f7336a6cab8>

One can easily see that Python switches on two different values. In some cases, this can be catastrophic (for example if we store the objects in some other complex object).

Now, if I store the objects in a List :

lst = []
for j in range(10):
    c = C()
    lst.append(c)
    print c

I get this :

<__main__.C instance at 0x7fd8f8f7eb00>
<__main__.C instance at 0x7fd8f8f7eab8>
<__main__.C instance at 0x7fd8f8f7eb48>
<__main__.C instance at 0x7fd8f8f7eb90>
<__main__.C instance at 0x7fd8f8f7ebd8>
<__main__.C instance at 0x7fd8f8f7ec20>
<__main__.C instance at 0x7fd8f8f7ec68>
<__main__.C instance at 0x7fd8f8f7ecb0>
<__main__.C instance at 0x7fd8f8f7ecf8>
<__main__.C instance at 0x7fd8f8f7ed40>

Which solves the case.

So now, I have to ask a question... Does anyone could explain with complex words (I mean, deeply) how Python behave with the objects references ? I suppose, it is a matter of optimization (to spare memory, or prevent leaks, ...)

Thank a lot.

EDIT : Ok so, let's be more specific. I'm quite aware that python has to collect garbage sometimes... But, in my case :

I had a list returned by a Cython defined class : class 'Network' that manages a 'Node's list (both Network and Node class are defined in a Cython extension). Each Node has a an object [then casted into (void *)] 'userdata' object. The Nodes list is populated from inside cython, while the UserData are populated inside the Python script. So in python, I had the following :

...
def some_python_class_method(self):
    nodes = self.netBinding.GetNetwork().get_nodes()
    ...
    for item in it:
        a_site = PyLabSiteEvent()
        #l_site.append(a_site)        # WARN : Required to get an instance on 'a_site' 
                                      #        that persits - workaround...
    item.SetUserData(a_site)

Reusing this node list later on in the same python class using the same cython getter :

def some_other_python_class_method(self, node):
    s_data = node.GetUserData()
    ...

So, it seems that with the storage made in the node list's UserDatas, my python script was completely blind and was freeing/reusing memory. It worked by referencing a second time (but apparently a first one for python side), using an additional list (here : 'l_site'). This is why I had to know a bit more about Python itself, but it seems that the way I implemented the communication between Python and Cython is responsible for the issues a had to face.

Answer

jsbueno picture jsbueno · Oct 11, 2012

There is no need to be "complex" here: In the first example, you keep no other reference to the object referenced by the name "c" - when running the code in the line "c = C()" on subsequent iterations of the loop, the one reference previously held in "c" is lost.

Since standard Python uses reference counting to keep track of when it should delete objects from memory, as at this moment the reference counting for the object of the previous loop interation reaches 0, it is destroyed, and its memory is made available for other objects.

Why do you have 2 changing values? Because at the moment the object in the new iteration is created - i.e. when Python executes the expression to the right side of the = in c = C(), the object of the precvious iteration still exists, referenced by the name c - so the new object is constructed at another memory locaton. Python then proceeds to the assignment of the new object to c at which point the previous object is destroyed as described above - which means that on the next (3rd) iteration, that memory will be available for a new instance of C.

On the second example, the newly created objects never loose reference, and therefore their memory is not freed at all - new objects always take up a new memory location.

Most important of all: The purpose of using a high level language such as Python or others, is not having to worry about memory allocation. The language takes care of that to you. In this case, the CPython (standard) implementation does just the right thing, as you noticed. Other implementations such as Pypy or Jython can have completely different behavior in regards to the "memory location" of each instances in the above examples, but all conforming implementatons (including these 3) will behave exactly the same from the "point of view" of the Python program: (1) It does have access to the instances it keeps a reference to, (2) the data of these instances is not corrupted or mangled in anyway.