I'm trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header line and the signature for header or start of a new sequence is ">". in a new line immediately after the header is the sequence of letters.I'm not done with code but so far I have this and it gives me this error:
AttributeError: 'str' object has no attribute 'next'
I'm not sure what's wrong here.
import re
header=""
counts=0
newline=""
f1=open('fpprotein_fasta(2).txt','r')
f2=open('motifs.xls','w')
for line in f1:
if line.startswith('>'):
header=line
#print header
nextline=line.next()
for i in nextline:
motif="ML[A-Z][A-Z][IV]R"
if re.findall(motif,nextline):
counts+=1
#print (header+'\t'+counts+'\t'+motif+'\n')
fout.write(header+'\t'+counts+'\t'+motif+'\n')
f1.close()
f2.close()
The error is likely coming from the line:
nextline=line.next()
line
is the string you have already read, there is no next()
method on it.
Part of the problem is that you're trying to mix two different ways of reading the file - you are iterating over the lines using for line in f1
and <handle>.next()
.
Also, if you are working with FASTA files I recommend using Biopython: it makes working with collections of sequences much easier. In particular, Chapter 14 on motifs will be of particular interest to you. This will likely require that you learn more about Python in order to achieve what you want, but if you're going to be doing a lot more bioinformatics than what your example here shows then it's definitely worth the investment of time.