I try to fetch a Wikipedia article with Python's urllib:
f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes")
s = f.read()
f.close()
However instead of the html page I get the following response: Error - Wikimedia Foundation:
Request: GET http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes, from 192.35.17.11 via knsq1.knams.wikimedia.org (squid/2.6.STABLE21) to ()
Error: ERR_ACCESS_DENIED, errno [No Error] at Tue, 23 Sep 2008 09:09:08 GMT
Wikipedia seems to block request which are not from a standard browser.
Anybody know how to work around this?
You need to use the urllib2 that superseedes urllib in the python std library in order to change the user agent.
Straight from the examples
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
infile = opener.open('http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes')
page = infile.read()