Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

backus picture backus · Feb 10, 2010 · Viewed 45.7k times · Source

I receive a 'HTTP Error 500: Internal Server Error' response, but I still want to read the data inside the error HTML.

With Python 2.6, I normally fetch a page using:

import urllib2
url = "http://google.com"
data = urllib2.urlopen(url)
data = data.read()

When attempting to use this on the failing URL, I get the exception urllib2.HTTPError:

urllib2.HTTPError: HTTP Error 500: Internal Server Error

How can I fetch such error pages (with or without urllib2), all while they are returning Internal Server Errors?

Note that with Python 3, the corresponding exception is urllib.error.HTTPError.

Answer

Joe Holloway picture Joe Holloway · Feb 10, 2010

The HTTPError is a file-like object. You can catch it and then read its contents.

try:
    resp = urllib2.urlopen(url)
    contents = resp.read()
except urllib2.HTTPError, error:
    contents = error.read()