I found a Python script (here: Wikipedia Extractor) that can generate plain text from (English) Wikipedia database dump. When I use this command (as it's stated on the script's page):
$ python enwiki-latest-pages-articles.xml WikiExtractor.py -b 500K -o extracted
I get this error:
File "enwiki-latest-pages-articles.xml", line 1 < mediawiki xmlns="http://www.mediawiki.org/xml/export-0.8/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.8/http://www.mediawiki.org/xml/export-0.8.xsd" version="0.8" xml:lang="en">
^
SyntaxError: invalid syntax
I'm executing the script using Python 2.7.6 & Cygwin on Windows 7.
I hope If anyone has already used this script or experience with Python can help me to solve this error.
Thanks in advance!
The first argument to python
should be the script name.
You probably need to swap xml
and py
file names:
$ python WikiExtractor.py enwiki-latest-pages-articles.xml -b 500K -o extracted