We have a tool which checks if a given URL is a live URL. If a given url is live another part of our software can screen scrap the content from it.
This is my code for checking if a url is live
public static bool IsLiveUrl(string url)
{
HttpWebRequest webRequest = WebRequest.Create(url) as HttpWebRequest;
webRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5";
webRequest.CookieContainer = new CookieContainer();
WebResponse webResponse;
try
{
webResponse = webRequest.GetResponse();
}
catch (WebException e)
{
return false;
}
catch (Exception ex)
{
return false;
}
return true;
}
This code works perfectly but for a particular site hosted on apache i am getting a web exception with following message. "The remote server returned an error: (403) Forbidden" On further inspection i found the following details in the WebException object
Status="ProtocolError" StatusDescription="Bad Behaviour"
This is the request header "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5 Host: scenicspares.co.uk Connection: Keep-Alive"
This is the response header "Keep-Alive: timeout=4, max=512 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html Date: Thu, 13 Jan 2011 10:29:36 GMT Server: Apache"
I extracted these headers using a watch in vs2008. The frame work in use is 3.5.
It turned out that all i needed to do was following
webRequest.Accept = "*/*";
webResponse = webRequest.GetResponse();
and it was fixed.