I am implementing a simple HTTP Client that just connects to a web server and gets its default homepage. Here it is and it works nice:
using System;
using System.Net.Sockets;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
TcpClient tc = new TcpClient();
tc.Connect("www.google.com", 80);
using (NetworkStream ns = tc.GetStream())
{
System.IO.StreamWriter sw = new System.IO.StreamWriter(ns);
System.IO.StreamReader sr = new System.IO.StreamReader(ns);
string req = "";
req += "GET / HTTP/1.0\r\n";
req += "Host: www.google.com\r\n";
req += "\r\n";
sw.Write(req);
sw.Flush();
Console.WriteLine("[reading...]");
Console.WriteLine(sr.ReadToEnd());
}
tc.Close();
Console.WriteLine("[done!]");
Console.ReadKey();
}
}
}
When I delete the below line from above code, the program blocks on sr.ReadToEnd.
req += "Host: www.google.com\r\n";
I even replaced sr.ReadToEnd with sr.Read, but it cannot read anything. I used Wireshark to see what's happen:
As you see, after my GET request Google doesn't respond and the request is retransmitted again and again. It seems we HAVE TO specify the Host part in HTTP request. The weird part is WE DON'T. I used telnet to send this request and got the respond from Google. I also captured the request sent by telnet and it was exactly same as my request.
I tried many other websites (e.g. Yahoo, Microsoft) but the result is same.
So, does the delay in telnet cause the web-server act differently (because in telnet we actually type characters instead of sending them together in 1 packet).
Another weird problem is when I change HTTP/1.0 to HTTP/1.1, the program always blocks on sr.ReadToEnd line. I guess that's because the web server don't close the connection.
One solution is using Read (or ReadLine) and ns.DataAvailable to read the response. But I cannot be sure that I have read all of the response. How I can read the response and be sure there is no more bytes left in the response of a HTTP/1.1 request?
Note: As W3 says,
the Host request-header field MUST accompany all HTTP/1.1 requests
(and I did it for my HTTP/1.1 requests). But I haven't seen such thing for HTTP/1.0. Also sending a request without Host header using telnet works without any problem.
Update:
Push flag has been set to 1 in the TCP segment. I also have tried netsh winsock reset to reset my TCP/IP stack. There is no firewalls nor anti-viruses on the testing computer. The packet are actually sent because Wireshark installed on another computer can capture it.
I also have tried some other requests. For Instance,
string req = "";
req += "GET / HTTP/1.0\r\n";
req += "s df slkjfd sdf/ s/fd \\sdf/\\\\dsfdsf \r\n";
req += "qwretyuiopasdfghjkl\r\n";
req += "Host: www.google.com\r\n";
req += "\r\n";
In all kind of requests, if I omit the Host: part, the web-server doesn't respond and if with a Host: part, even an invalid request (just like the above request) will be responded (by a 400: HTTP Bad Request).
nos says the Host: part is not required on his machine, and this makes the situation more weird.
This pertains to using TcpClient.
I know this post is old. I am providing this information just in case anyone else comes across this. Consider this answer a supplement to all of the above answers.
The HTTP host header is required by some servers since they are setup to host more than one domain per IP address. As a general rule, always sent the Host header. A good server will reply with "Not Found". Some servers won't reply at all.
When the call to read data from the stream blocks, it's usually because the server is waiting for more data to be sent. This is typically the case when the HTTP 1.1 spec is not followed closely. To demonstrate this, try omitting the final CR LF sequence and then read data from the stream - the call to read will wait until either the client times out or the server gives up waiting by terminating the connection.
I hope this sheds a bit of light...