I have to check presence of millions of elements (20-30 letters str) in the list containing 10-100k of those elements. Is there faster way of doing that in python than set()
?
import sys
#load ids
ids = set( x.strip() for x in open(idfile) )
for line in sys.stdin:
id=line.strip()
if id in ids:
#print fastq
print id
#update ids
ids.remove( id )
set
is as fast as it gets.
However, if you rewrite your code to create the set
once, and not change it, you can use the frozenset
built-in type. It's exactly the same except immutable.
If you're still having speed problems, you need to speed your program up in other ways, such as by using PyPy instead of cPython.