Python check if website exists

James Hallen picture James Hallen · May 27, 2013 · Viewed 124.3k times · Source

I wanted to check if a certain website exists, this is what I'm doing:

user_agent = 'Mozilla/20.0.1 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent':user_agent }
link = "http://www.abc.com"
req = urllib2.Request(link, headers = headers)
page = urllib2.urlopen(req).read() - ERROR 402 generated here!

If the page doesn't exist (error 402, or whatever other errors), what can I do in the page = ... line to make sure that the page I'm reading does exit?

Answer

Adem Öztaş picture Adem Öztaş · May 27, 2013

You can use HEAD request instead of GET. It will only download the header, but not the content. Then you can check the response status from the headers.

For python 2.7.x, you can use httplib:

import httplib
c = httplib.HTTPConnection('www.example.com')
c.request("HEAD", '')
if c.getresponse().status == 200:
   print('web site exists')

or urllib2:

import urllib2
try:
    urllib2.urlopen('http://www.example.com/some_page')
except urllib2.HTTPError, e:
    print(e.code)
except urllib2.URLError, e:
    print(e.args)

or for 2.7 and 3.x, you can install requests

import requests
request = requests.get('http://www.example.com')
if request.status_code == 200:
    print('Web site exists')
else:
    print('Web site does not exist')