How does Python's Garbage Collector Detect Circular References?

user1245262 picture user1245262 · Jun 9, 2012 · Viewed 12.1k times · Source

I'm trying to understand how Python's garbage collector detects circular references. When I look at the documentation, all I see is a statement that circular references are detected, except when the objects involved have a __del__ method.

If this happens, my understanding (possibly faulty) is that the gc module acts as a failsafe by (I assume) walking through all the allocated memory and freeing any unreachable blocks.

How does Python detect & free circular memory references before making use of the gc module?

Answer

David Wolever picture David Wolever · Jun 9, 2012

How does Python detect & free circular memory references before making use of the gc module?

It doesn't. The gc exists only to detect and free circular references. Non-circular references are handled through refcounting.

Now, to see how gc determines the set of objects referenced by any given object, take a look at the gc_get_references function in Modules/gcmodule.c. The relevant bit is:

// Where `obj` is the object who's references we want to find
traverseproc traverse;
if (! PyObject_IS_GC(obj))
    continue;
traverse = Py_TYPE(obj)->tp_traverse;
if (! traverse)
    continue;
if (traverse(obj, (visitproc)referentsvisit, result)) {
    Py_DECREF(result);
    return NULL;
}

The major function here is tp_traverse. Each C-level type defines a tp_traverse function (or in the case of objects which don't hold any references, like str, sets it to NULL). One example of tp_traverse is list_traverse, the traversal function for list:

static int
list_traverse(PyListObject *o, visitproc visit, void *arg)
{
    Py_ssize_t i;

    for (i = Py_SIZE(o); --i >= 0; )
        Py_VISIT(o->ob_item[i]);
    return 0;
}

I see is a statement that circular references are detected, except when the objects involved have a __del__() method.

You are correct — Python's cycle detector can detect and collect cycles unless they contain objects with a __del__ method, as there is no way for the interpreter to safely delete these objects (to get an intuition on why this is, imagine you've got two objects with __del__ methods that reference each other. In which order should they be freed?).

When objects with a __del__ method are involved in a cycle, the garbage collector will stick them in a separate list (accessible through gc.garbage) so that the programmer can manually "deal with" them.