Obtaining data from PubMed using python

Ruchik Yajnik picture Ruchik Yajnik · Jul 1, 2013 · Viewed 18.3k times · Source

I have a list of PubMed entries along with the PubMed ID's. I would like to create a python script or use python which accepts a PubMed id number as an input and then fetches the abstract from the PubMed website.

So far I have come across NCBI Eutilities and the importurl library in Python but I don't know how I should go about writing a template.

Any pointers will be appreciated.

Thank you,

Answer

Karol picture Karol · Nov 22, 2013

Using Biopython's module called Entrez, you can get the abstract along with all other metadata quite easily. This will print the abstract:

from Bio.Entrez import efetch

def print_abstract(pmid):
    handle = efetch(db='pubmed', id=pmid, retmode='text', rettype='abstract')
    print handle.read()

And here is a function that will fetch XML and return just the abstract:

from Bio.Entrez import efetch, read

def fetch_abstract(pmid):
    handle = efetch(db='pubmed', id=pmid, retmode='xml')
    xml_data = read(handle)[0]
    try:
        article = xml_data['MedlineCitation']['Article']
        abstract = article['Abstract']['AbstractText'][0]
        return abstract
    except IndexError:
        return None

P.S. I actually had the need to do this kind of stuff in a real task, so I organized the code into a class -- see this gist.