faster membership testing in python than set()

Leszek picture Leszek · Aug 18, 2011 · Viewed 24.1k times · Source

I have to check presence of millions of elements (20-30 letters str) in the list containing 10-100k of those elements. Is there faster way of doing that in python than set() ?

import sys
#load ids
ids = set( x.strip() for x in open(idfile) )

for line in sys.stdin:
    id=line.strip()
    if id in ids:
        #print fastq
        print id
        #update ids
        ids.remove( id )

Answer

agf picture agf · Aug 18, 2011

set is as fast as it gets.

However, if you rewrite your code to create the set once, and not change it, you can use the frozenset built-in type. It's exactly the same except immutable.

If you're still having speed problems, you need to speed your program up in other ways, such as by using PyPy instead of cPython.