I have been using Amazon's Product Advertising API to generate urls that contains prices for a given book. One url that I have generated is the following:
When I click on the link or paste the link on the address bar, the web page loads fine. However, when I execute the following code I get an error:
url = "http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327"
html_contents = urllib2.urlopen(url)
The error is urllib2.HTTPError: HTTP Error 503: Service Unavailable. First of all, I don't understand why I even get this error since the web page successfully loads.
Also, another weird behavior that I have noticed is that the following code sometimes does and sometimes does not give the stated error:
html_contents = urllib2.urlopen("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327")
I am totally lost on how this behavior occurs. Is there any fix or work around to this? My goal is to read the html contents of the url.
EDIT
I don't know why stack overflow is changing my code to change the amazon link I listed above in my code to rads.stackoverflow. Anyway, ignore the rads.stackoverflow link and use my link above between the quotes.
Amazon is rejecting the default User-Agent for urllib2 . One workaround is to use the requests module
import requests
page = requests.get("http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327")
html_contents = page.text
If you insist on using urllib2, this is how a header can be faked to do it:
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open('http://www.amazon.com/gp/offer-listing/0415376327%3FSubscriptionId%3DAKIAJZY2VTI5JQ66K7QQ%26tag%3Damaztest04-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0415376327')
html_contents = response.read()
Don't worry about stackoverflow editing the URL. They explain that they are doing this here.