Linux Socket: How to detect disconnected network in a client program?

user2052197 picture user2052197 · Feb 8, 2013 · Viewed 55.4k times · Source

I am debugging a c based linux socket program. As all the examples available in websites, I applied the following structure:

sockfd= socket(AF_INET, SOCK_STREAM, 0);

connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr));

send_bytes = send(sockfd, sock_buff, (size_t)buff_bytes, MSG_DONTWAIT);

I can detect the disconnection when the remove server closes its server program. But if I unplug the ethernet cable, the send function still return positive values rather than -1.

How can I check the network connection in a client program assuming that I can not change server side?

Answer

cnicutar picture cnicutar · Feb 8, 2013

But if I unplug the ethernet cable, the send function still return positive values rather than -1.

First of all you should know send doesn't actually send anything, it's just a memory-copying function/system call. It copies data from your process to the kernel - sometime later the kernel will fetch that data and send it to the other side after packaging it in segments and packets. Therefore send can only return an error if:

  • The socket is invalid (for example bogus file descriptor)
  • The connection is clearly invalid, for example it hasn't been established or has already been terminated in some way (FIN, RST, timeout - see below)
  • There's no more room to copy the data

The main point is that send doesn't send anything and therefore its return code doesn't tell you anything about data actually reaching the other side.

Back to your question, when TCP sends data it expects a valid acknowledgement in a reasonable amount of time. If it doesn't get one, it resends. How often does it resend ? Each TCP stack does things differently, but the norm is to use exponential backoffs. That is, first wait 1 second, then 2, then 4 and so on. On some stacks this process can take minutes.

The main point is that in the case of an interruption TCP will declare a connection dead only after a seriously large period of silence (on Linux it does something like 15 retries - more than 5 minutes).

One way to solve this is to implement some acknowledgement mechanism in your application. You could for example send a request to the server "reply within 5 seconds or I'll declare this connection dead" and then recv with a timeout.