Functionality of Python `in` vs. `__contains__`

joshua.r.smith picture joshua.r.smith · Jul 23, 2016 · Viewed 14.1k times · Source

I implemented the __contains__ method on a class for the first time the other day, and the behavior wasn't what I expected. I suspect there's some subtlety to the in operator that I don't understand and I was hoping someone could enlighten me.

It appears to me that the in operator doesn't simply wrap an object's __contains__ method, but it also attempts to coerce the output of __contains__ to boolean. For example, consider the class

class Dummy(object):
    def __contains__(self, val):
        # Don't perform comparison, just return a list as
        # an example.
        return [False, False]

The in operator and a direct call to the __contains__ method return very different output:

>>> dum = Dummy()
>>> 7 in dum
True
>>> dum.__contains__(7)
[False, False]

Again, it looks like in is calling __contains__ but then coercing the result to bool. I can't find this behavior documented anywhere except for the fact that the __contains__ documentation says __contains__ should only ever return True or False.

I'm happy following the convention, but can someone tell me the precise relationship between in and __contains__?

Epilogue

I decided to choose @eli-korvigo answer, but everyone should look at @ashwini-chaudhary comment about the bug, below.

Answer

Eli Korvigo picture Eli Korvigo · Jul 23, 2016

Use the source, Luke!

Let's trace down the in operator implementation

>>> import dis
>>> class test(object):
...     def __contains__(self, other):
...         return True

>>> def in_():
...     return 1 in test()

>>> dis.dis(in_)
    2           0 LOAD_CONST               1 (1)
                3 LOAD_GLOBAL              0 (test)
                6 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
                9 COMPARE_OP               6 (in)
               12 RETURN_VALUE

As you can see, the in operator becomes the COMPARE_OP virtual machine instruction. You can find that in ceval.c

TARGET(COMPARE_OP)
    w = POP();
    v = TOP();
    x = cmp_outcome(oparg, v, w);
    Py_DECREF(v);
    Py_DECREF(w);
    SET_TOP(x);
    if (x == NULL) break;
    PREDICT(POP_JUMP_IF_FALSE);
    PREDICT(POP_JUMP_IF_TRUE);
    DISPATCH(); 

Take a look at one of the switches in cmp_outcome()

case PyCmp_IN:
    res = PySequence_Contains(w, v);
    if (res < 0)
         return NULL;
    break;

Here we have the PySequence_Contains call

int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
    Py_ssize_t result;
    PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
    if (sqm != NULL && sqm->sq_contains != NULL)
        return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}

That always returns an int (a boolean).

P.S.

Thanks to Martijn Pieters for providing the way to find the implementation of the in operator.