How can I get a Wikipedia article's text using Python 3 with Beautiful Soup?

user10798111 picture user10798111 · Dec 16, 2018 · Viewed 7.4k times · Source

I have this script made in Python 3:

response = simple_get("https://en.wikipedia.org/wiki/Mathematics")
result = {}
result["url"] = url
if response is not None:
    html = BeautifulSoup(response, 'html.parser')
    title = html.select("#firstHeading")[0].text

As you can see I can get the title from the article, but I cannot figure out how to get the text from "Mathematics (from Greek μά..." to the contents table...

Answer

alecxe picture alecxe · Dec 16, 2018

There is a much, much more easy way to get information from wikipedia - Wikipedia API.

There is this Python wrapper, which allows you to do it in a few lines only with zero HTML-parsing:

import wikipediaapi

wiki_wiki = wikipediaapi.Wikipedia('en')

page = wiki_wiki.page('Mathematics')
print(page.summary)

Prints:

Mathematics (from Greek μάθημα máthēma, "knowledge, study, learning") includes the study of such topics as quantity, structure, space, and change...(omitted intentionally)

And, in general, try to avoid screen-scraping if there's a direct API available.