I am using Python to parse a large file. What I want to do is
If condition =True
append to list A
else
append to list B
I want to use generator expressions for this - to save memory. I am putting in the actual code.
def is_low_qual(read):
lowqual_bp=(bq for bq in phred_quals(read) if bq < qual_threshold)
if iter_length(lowqual_bp) > num_allowed:
return True
else:
return False
lowqual=(read for read in SeqIO.parse(r_file,"fastq") if is_low_qual(read)==True)
highqual=(read for read in SeqIO.parse(r_file,"fastq") if is_low_qual(read)==False)
SeqIO.write(highqual,flt_out_handle,"fastq")
SeqIO.write(lowqual,junk_out_handle,"fastq")
def iter_length(the_gen):
return sum(1 for i in the_gen)
You can use itertools.tee
in conjunction with itertools.ifilter
and itertools.ifilterfalse
:
import itertools
def is_condition_true(x):
...
gen1, gen2 = itertools.tee(sequences)
low = itertools.ifilter(is_condition_true, gen1)
high = itertools.ifilterfalse(is_condition_true, gen2)
Using tee
ensures that the function works correctly even if sequences is itself a generator.
Note, though, that tee
could itself use a fair bit of memory (up to a list of size len(sequences)
) if low
and high
are consumed at different rates (e.g. if low
is exhausted before high
is used).