Python 3.4 urllib.request error (http 403)

Question 1

Python 3.4 urllib.request error (http 403)

python python-3.x urllib

Belial · Feb 8, 2015 · Viewed 23.7k times · Source

Answer

Answer

It seems like the site does not like the user agent of Python 3.x.

Specifying User-Agent will solve your problem:

import urllib.request
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()

NOTE Python 2.x urllib version also receives 403 status, but unlike Python 2.x urllib2 and Python 3.x urllib, it does not raise the exception.

You can confirm that by following code:

print(urllib.urlopen(url).getcode())  # => 403

Question 2

I'm trying to open and parse a html page. In python 2.7.8 I have no problem:

import urllib
url = "https://ipdb.at/ip/66.196.116.112"
html = urllib.urlopen(url).read()

and everything is fine. However I want to move to python 3.4 and there I get HTTP error 403 (Forbidden). My code:

import urllib.request
html = urllib.request.urlopen(url) # same URL as before

File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 461, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 499, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

It work for other URLs which don't use https.

url = 'http://www.stopforumspam.com/ipcheck/212.91.188.166'

is ok.

Python 3.4 urllib.request error (http 403)

Answer

Related questions