I've been searching for hours on how to extract the main text of a Wikipedia article, without all the links and references. I've tried wikitools, mwlib, BeautifulSoup and more. But I haven't really managed to.
Is there any easy and fast way for me to take the clear text (the actual article), and put it in a Python variable?
SOLUTION: Omid Raha solved it :)
You can use this package, that is a python wrapper for Wikipedia API,
Here is a quick start.
First install it:
pip install wikipedia
Example:
import wikipedia
p = wikipedia.page("Python programming language")
print(p.url)
print(p.title)
content = p.content # Content of page.
Output:
http://en.wikipedia.org/wiki/Python_(programming_language)
Python (programming language)