send and recv on same socket from different threads not working

yuf picture yuf · Mar 13, 2013 · Viewed 9.9k times · Source

I read that it should be safe from different threads concurrently, but my program has some weird behaviour and I don't know what's wrong.

I have concurrent threads communicating with a client socket

  1. one doing send to a socket
  2. one doing select and then recv from the same socket

As I'm still sending, the client has already received the data and closed the socket. At the same time, I'm doing a select and recv on that socket, which returns 0 (since it is closed) so I close this socket. However, the send has not returned yet...and since I call close on this socket the send call fails with EBADF.

I know the client has received the data correctly since I output it after I close the socket and it is right. However, on my end, my send call is still returning an error (EBADF), so I want to fix it so it doesn't fail.

This doesn't always happen. It happens maybe 40% of the time. I don't use sleep anywhere. Am I supposed to have pauses between sends or recvs or anything?

Here's some code:

Sending:

while(true)
{
    // keep sending until send returns 0
    n = send(_sfd, bytesPtr, sentSize, 0);

    if (n == 0)
    {
        break;
    }
    else if(n<0)
    {
        cerr << "ERROR: send returned an error "<<errno<< endl; // this case is triggered
        return n;
    }

    sentSize -= n;
    bytesPtr += n;
}

Receiving:

 while(true)
{
    memset(bufferPointer,0,sizeLeft);
    n = recv(_sfd,bufferPointer,sizeLeft, 0);
    if (debug) cerr << "Receiving..."<<sizeLeft<<endl;
    if(n == 0)
    {
        cerr << "Connection closed"<<endl; // this case is triggered
        return n;
    }
    else if (n < 0)
    {
        cerr << "ERROR reading from socket"<<endl;
        return n;
    }
     bufferPointer += n;
     sizeLeft -= n;
     if(sizeLeft <= 0) break;

}

On the client, I use the same receive code, then I call close() on the socket. Then on my side, I get 0 from the receive call and also call close() on the socket Then my send fails. It still hasn't finished?! But my client already got the data!

Answer

Cartroo picture Cartroo · Mar 13, 2013

I must admit I'm surprised you see this problem as often as you do, but it's always a possibility when you're dealing with threads. When you call send() you'll end up going into the kernel to append the data to the socket buffer in there, and it's therefore quite likely that there'll be a context switch, maybe to another process in the system. Meanwhile the kernel has probably buffered and transmitted the packet quite quickly. I'm guessing you're testing on a local network, so the other end receives the data and closes the connection and sends the appropriate FIN back to your end very quickly. This could all happen while the sending machine is still running other threads or processes because the latency on a local ethernet network is so low.

Now the FIN arrives - your receive thread hasn't done a lot lately since it's been waiting for input. Many scheduling systems will therefore raise its priority quite a bit and there's a good chance it'll be run next (you don't specify which OS you're using but this is likely to happen on at least Linux, for example). This thread closes the socket due to its zero read. At some point shortly after this the sending thread will be re-awoken, but presumably the kernel notices that the socket is closed before it returns from the blocked send() and returns EBADF.

Now this is just speculation as to the exact cause - among other things it heavily depends on your platform. But you can see how this could happen.

The easiest solution is probably to use poll() in the sending thread as well, but wait for the socket to become write-ready instead of read-ready. Obviously you also need to wait until there's any buffered data to send - how you do that depends on which thread buffers the data. The poll() call will let you detect when the connection has been closed by flagging it with POLLHUP, which you can detect before you try your send().

As a general rule you shouldn't close a socket until you're certain that the send buffer has been fully flushed - you can only be sure of this once the send() call has returned and indicates that all the remaining data has gone out. I've handled this in the past by checking the send buffer when I get a zero read and if it's not empty I set a "closing" flag. In your case the sending thread would then use this as a hint to do the close once everything is flushed. This matters because if the remote end does a half-close with shutdown() then you'll get a zero read even if it might still be reading. You might not care about half closes, however, in which case your strategy above is OK.

Finally, I personally would avoid the hassle of sending and receiving threads and just have a single thread which does both - that's more or less the point of select() and poll(), to allow a single thread of execution to deal with one or more filehandles without worrying about performing an operation which blocks and starves the other connections.