Screen scraping: getting around "HTTP Error 403: request disallowed by robots.txt"

Diego picture Diego · May 17, 2010 · Viewed 41.5k times · Source

Is there a way to get around the following?

httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt

Is the only way around this to contact the site-owner (barnesandnoble.com).. i'm building a site that would bring them more sales, not sure why they would deny access at a certain depth.

I'm using mechanize and BeautifulSoup on Python2.6.

hoping for a work-around

Answer

Yuda Prawira picture Yuda Prawira · Oct 3, 2010

oh you need to ignore the robots.txt

br = mechanize.Browser()
br.set_handle_robots(False)