Extract the main article text from a Wikipedia page using Python

Paolo picture Paolo · Apr 28, 2014 · Viewed 7.2k times · Source

I've been searching for hours on how to extract the main text of a Wikipedia article, without all the links and references. I've tried wikitools, mwlib, BeautifulSoup and more. But I haven't really managed to.

Is there any easy and fast way for me to take the clear text (the actual article), and put it in a Python variable?

SOLUTION: Omid Raha solved it :)

Answer

Omid Raha picture Omid Raha · Apr 28, 2014

You can use this package, that is a python wrapper for Wikipedia API,

Here is a quick start.

First install it:

pip install wikipedia

Example:

import wikipedia
p = wikipedia.page("Python programming language")
print(p.url)
print(p.title)
content = p.content # Content of page.

Output:

http://en.wikipedia.org/wiki/Python_(programming_language)
Python (programming language)