I have to download only complete genome sequences from NCBI (GenBank(full) format). I am intrested in 'complete geneome' not 'whole genome'.
my script:
from Bio import Entrez
Entrez.email = "[email protected]"
gatunek='Escherichia[ORGN]'
handle = Entrez.esearch(db='nucleotide',
term=gatunek, property='complete genome' )#title='complete genome[title]')
result = Entrez.read(handle)
As a results I get only small fragments of genomes, whith size about 484 bp:
LOCUS NZ_KE350773 484 bp DNA linear CON 23-AUG-2013
DEFINITION Escherichia coli E1777 genomic scaffold scaffold9_G, whole genome
shotgun sequence.
I know how to do it manually via NCBI web site but it is very time consuming, the query that I use there:
escherichia[orgn] AND complete genome[title]
and as result I get multiple genomes with sizes range about 5,154,862 bp and this is what I need to do via ENTREZ.esearch.
You've done the hard part and worked out the query,
escherichia[orgn] AND complete genome[title]
So use that as the search query via Biopython as well!
from Bio import Entrez
Entrez.email = "[email protected]"
search_term = "escherichia[orgn] AND complete genome[title]"
handle = Entrez.esearch(db='nucleotide', term=search_term)
result = Entrez.read(handle)
handle.close()
print(result['Count']) # added parenthesis
Currently that gives me 140 results, starting with 545778205, which is the same as the website: http://www.ncbi.nlm.nih.gov/nuccore/?term=escherichia%5Borgn%5D+AND+complete+genome%5Btitle%5D