Using Biopython (Python) to extract sequence from FASTA file

Question 1

Using Biopython (Python) to extract sequence from FASTA file

python python-2.7 biopython fasta

user1784467 · Oct 30, 2012 · Viewed 11.4k times · Source

Answer

Answer

Biopython is just perfect for these kinds of tasks. The Seq-Object stores a sequence and info about it. Reading the fasta file format is straight forward. You can access the sequence like a simple list and, hence, access certain positions straight forward as well:

from Bio import SeqIO

with open("outfile.txt","w") as f:
        for seq_record in SeqIO.parse("infile.fasta", "fasta"):
                f.write(str(seq_record.id) + "\n")
                f.write(str(seq_record.seq[:10]) + "\n")  #first 10 base positions
                f.write(str(seq_record.seq[-10:]) + "\n") #last 10 base positions

Question 2

Ok so I need to extract part of a sequence from a FASTA file, using python (biopython, http://biopython.org/DIST/docs/tutorial/Tutorial.html)

I need to get the first 10 bases from each sequence and put them in one file, preserving the sequence info from the FASTA format. Worst comes to worst, I could just use the bases if there's no way to keep the sequence info. So here's an example:

>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA
CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAACGATCGAGTG
AATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGTGACCCTGATTTGTTGTTGGG

>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA
CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAACGATCGAGTG
AATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGTGACCCTGATTTGTTGTTGGG

>gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA
CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGGAATAAACGATCGAGTG
AATCCGGAGGACCGGTGTACTCAGCTCACCGGGGGCATTGCTCCCGTGGTGACCCTGATTTGTTGTTGGG

I need some way to get the first 10 bases (and then I was planning on doing it again for the last 10 bases). That tutorial site is pretty thorough but I'm new to this and since it doesn't go into this, I'm not even sure if it's possible. Thanks for any help you can give.

Using Biopython (Python) to extract sequence from FASTA file

Answer

Related questions