How to correctly read data when using epoll_wait

charfeddine.ahmed picture charfeddine.ahmed · Apr 4, 2011 · Viewed 9.4k times · Source

I am trying to port to Linux an existing Windows C++ code that uses IOCP. Having decided to use epoll_wait to achieve high concurrency, I am already faced with a theoretical issue of when we try to process received data.

Imagine two threads calling epoll_wait, and two consequetives messages being received such that Linux unblocks the first thread and soon the second.

Example :

Thread 1 blocks on epoll_wait
Thread 2 blocks on epoll_wait
Client sends a chunk of data 1
Thread 1 deblocks from epoll_wait, performs recv and tries to process data
Client sends a chunk of data 2
Thread 2 deblocks, performs recv and tries to process data.

Is this scenario conceivable ? I.e. can it occure ?

Is there a way to prevent it so to avoid implementing synchronization in the recv/processing code ?

Answer

bdonlan picture bdonlan · Apr 4, 2011

If you have multiple threads reading from the same set of epoll handles, I would recommend you put your epoll handles in one-shot level-triggered mode with EPOLLONESHOT. This will ensure that, after one thread observes the triggered handle, no other thread will observe it until you use epoll_ctl to re-arm the handle.

If you need to handle read and write paths independently, you may want to completely split up the read and write thread pools; have one epoll handle for read events, and one for write events, and assign threads to one or the other exclusively. Further, have a separate lock for read and for write paths. You must be careful about interactions between the read and write threads as far as modifying any per-socket state, of course.

If you do go with that split approach, you need to put some thought into how to handle socket closures. Most likely you will want an additional shared-data lock, and 'acknowledge closure' flags, set under the shared data lock, for both read and write paths. Read and write threads can then race to acknowledge, and the last one to acknowledge gets to clean up the shared data structures. That is, something like this:

void OnSocketClosed(shareddatastructure *pShared, int writer)
{
  epoll_ctl(myepollhandle, EPOLL_CTL_DEL, pShared->fd, NULL);
  LOCK(pShared->common_lock);
  if (writer)
    pShared->close_ack_w = true;
  else
    pShared->close_ack_r = true;

  bool acked = pShared->close_ack_w && pShared->close_ack_r;
  UNLOCK(pShared->common_lock);

  if (acked)
    free(pShared);
}