I implemented the __contains__
method on a class for the first time the other day, and the behavior wasn't what I expected. I suspect there's some subtlety to the in
operator that I don't understand and I was hoping someone could enlighten me.
It appears to me that the in
operator doesn't simply wrap an object's __contains__
method, but it also attempts to coerce the output of __contains__
to boolean. For example, consider the class
class Dummy(object):
def __contains__(self, val):
# Don't perform comparison, just return a list as
# an example.
return [False, False]
The in
operator and a direct call to the __contains__
method return very different output:
>>> dum = Dummy()
>>> 7 in dum
True
>>> dum.__contains__(7)
[False, False]
Again, it looks like in
is calling __contains__
but then coercing the result to bool
. I can't find this behavior documented anywhere except for the fact that the __contains__
documentation says __contains__
should only ever return True
or False
.
I'm happy following the convention, but can someone tell me the precise relationship between in
and __contains__
?
I decided to choose @eli-korvigo answer, but everyone should look at @ashwini-chaudhary comment about the bug, below.
Use the source, Luke!
Let's trace down the in
operator implementation
>>> import dis
>>> class test(object):
... def __contains__(self, other):
... return True
>>> def in_():
... return 1 in test()
>>> dis.dis(in_)
2 0 LOAD_CONST 1 (1)
3 LOAD_GLOBAL 0 (test)
6 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
9 COMPARE_OP 6 (in)
12 RETURN_VALUE
As you can see, the in
operator becomes the COMPARE_OP
virtual machine instruction. You can find that in ceval.c
TARGET(COMPARE_OP)
w = POP();
v = TOP();
x = cmp_outcome(oparg, v, w);
Py_DECREF(v);
Py_DECREF(w);
SET_TOP(x);
if (x == NULL) break;
PREDICT(POP_JUMP_IF_FALSE);
PREDICT(POP_JUMP_IF_TRUE);
DISPATCH();
Take a look at one of the switches in cmp_outcome()
case PyCmp_IN:
res = PySequence_Contains(w, v);
if (res < 0)
return NULL;
break;
Here we have the PySequence_Contains
call
int
PySequence_Contains(PyObject *seq, PyObject *ob)
{
Py_ssize_t result;
PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
if (sqm != NULL && sqm->sq_contains != NULL)
return (*sqm->sq_contains)(seq, ob);
result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
}
That always returns an int
(a boolean).
P.S.
Thanks to Martijn Pieters for providing the way to find the implementation of the in
operator.