how to read a fasta file in python?

user3098683 picture user3098683 · Dec 14, 2013 · Viewed 26.6k times · Source

I'm trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header line and the signature for header or start of a new sequence is ">". in a new line immediately after the header is the sequence of letters.I'm not done with code but so far I have this and it gives me this error:

AttributeError: 'str' object has no attribute 'next'

I'm not sure what's wrong here.

import re

header=""
counts=0
newline=""

f1=open('fpprotein_fasta(2).txt','r')
f2=open('motifs.xls','w')
for line in f1:
    if line.startswith('>'):
        header=line
        #print header
        nextline=line.next()
        for i in nextline:
            motif="ML[A-Z][A-Z][IV]R"
            if re.findall(motif,nextline):
                counts+=1
                #print (header+'\t'+counts+'\t'+motif+'\n')
        fout.write(header+'\t'+counts+'\t'+motif+'\n')

f1.close()
f2.close()

Answer

iainmcgin picture iainmcgin · Dec 14, 2013

The error is likely coming from the line:

nextline=line.next()

line is the string you have already read, there is no next() method on it.

Part of the problem is that you're trying to mix two different ways of reading the file - you are iterating over the lines using for line in f1 and <handle>.next().

Also, if you are working with FASTA files I recommend using Biopython: it makes working with collections of sequences much easier. In particular, Chapter 14 on motifs will be of particular interest to you. This will likely require that you learn more about Python in order to achieve what you want, but if you're going to be doing a lot more bioinformatics than what your example here shows then it's definitely worth the investment of time.