Fetch a Wikipedia article with Python

dkp picture dkp · Sep 23, 2008 · Viewed 26.4k times · Source

I try to fetch a Wikipedia article with Python's urllib:

f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes")           
s = f.read()
f.close()

However instead of the html page I get the following response: Error - Wikimedia Foundation:

Request: GET http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes, from 192.35.17.11 via knsq1.knams.wikimedia.org (squid/2.6.STABLE21) to ()
Error: ERR_ACCESS_DENIED, errno [No Error] at Tue, 23 Sep 2008 09:09:08 GMT 

Wikipedia seems to block request which are not from a standard browser.

Anybody know how to work around this?

Answer

Florian Bösch picture Florian Bösch · Sep 23, 2008

You need to use the urllib2 that superseedes urllib in the python std library in order to change the user agent.

Straight from the examples

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
infile = opener.open('http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes')
page = infile.read()