I have this script made in Python 3:
response = simple_get("https://en.wikipedia.org/wiki/Mathematics")
result = {}
result["url"] = url
if response is not None:
html = BeautifulSoup(response, 'html.parser')
title = html.select("#firstHeading")[0].text
As you can see I can get the title from the article, but I cannot figure out how to get the text from "Mathematics (from Greek μά..." to the contents table...
There is a much, much more easy way to get information from wikipedia - Wikipedia API.
There is this Python wrapper, which allows you to do it in a few lines only with zero HTML-parsing:
import wikipediaapi
wiki_wiki = wikipediaapi.Wikipedia('en')
page = wiki_wiki.page('Mathematics')
print(page.summary)
Prints:
Mathematics (from Greek μάθημα máthēma, "knowledge, study, learning") includes the study of such topics as quantity, structure, space, and change...(omitted intentionally)
And, in general, try to avoid screen-scraping if there's a direct API available.