Python urllib2.HTTPError: HTTP Error 503: Service Unavailable on valid website

user2548635 picture user2548635 · Sep 19, 2014 · Viewed 41.7k times · Source

I have been using Amazon's Product Advertising API to generate urls that contains prices for a given book. One url that I have generated is the following:

http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327

When I click on the link or paste the link on the address bar, the web page loads fine. However, when I execute the following code I get an error:

url = "http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327"
html_contents = urllib2.urlopen(url)

The error is urllib2.HTTPError: HTTP Error 503: Service Unavailable. First of all, I don't understand why I even get this error since the web page successfully loads.

Also, another weird behavior that I have noticed is that the following code sometimes does and sometimes does not give the stated error:

html_contents = urllib2.urlopen("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327")

I am totally lost on how this behavior occurs. Is there any fix or work around to this? My goal is to read the html contents of the url.

EDIT

I don't know why stack overflow is changing my code to change the amazon link I listed above in my code to rads.stackoverflow. Anyway, ignore the rads.stackoverflow link and use my link above between the quotes.

Answer

Spade picture Spade · Sep 19, 2014

Amazon is rejecting the default User-Agent for urllib2 . One workaround is to use the requests module

import requests
page = requests.get("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327")
html_contents = page.text

If you insist on using urllib2, this is how a header can be faked to do it:

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open('http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327')
html_contents = response.read()

Don't worry about stackoverflow editing the URL. They explain that they are doing this here.