Python string 'in' operator implementation algorithm and time complexity

mitchelllc picture mitchelllc · Aug 9, 2013 · Viewed 11.8k times · Source

I am thinking of how the in operator implement, for instance

>>> s1 = 'abcdef'
>>> s2 = 'bcd'
>>> s2 in s1
True

In CPython, which algorithm is used to implement the string match, and what is the time complexity? Is there any official document or wiki about this?

Answer

arshajii picture arshajii · Aug 9, 2013

It's a combination of Boyer-Moore and Horspool.

You can view the C code here:

Fast search/count implementation, based on a mix between Boyer-Moore and Horspool, with a few more bells and whistles on the top. For some more background, see: http://effbot.org/zone/stringlib.htm.

From the link above:

When designing the new algorithm, I used the following constraints:

  • should be faster than the current brute-force algorithm for all test cases (based on real-life code), including Jim Hugunin’s worst-case test
  • small setup overhead; no dynamic allocation in the fast path (O(m) for speed, O(1) for storage)
  • sublinear search behaviour in good cases (O(n/m))
  • no worse than the current algorithm in worst case (O(nm))
  • should work well for both 8-bit strings and 16-bit or 32-bit Unicode strings (no O(σ) dependencies)
  • many real-life searches should be good, very few should be worst case
  • reasonably simple implementation