ubuntu: sem_timedwait not waking (C)

giles123 picture giles123 · May 28, 2010 · Viewed 8.8k times · Source

I have 3 processes which need to be synchronized. Process one does something then wakes process two and sleeps, which does something then wakes process three and sleeps, which does something and wakes process one and sleeps. The whole loop is timed to run around 25hz (caused by an external sync into process one before it triggers process two in my "real" application). I use sem_post to trigger (wake) each process, and sem_timedwait() to wait for the trigger.

This all works successfully for several hours. However at some random time (usually after somewhere between two and four hours), one of the processes starts timing out in sem_timedwait(), even though I am sure the semaphore is being triggered with sem_post(). To prove this I even use sem_getvalue() immediately after the timeout, and the value is 1, so the timedwait should have been triggered.

Please see following code:

#include <stdio.h>
#include <time.h>
#include <string.h>
#include <errno.h>
#include <semaphore.h>

sem_t trigger_sem1, trigger_sem2, trigger_sem3;

// The main thread process.  Called three times with a different num arg - 1, 2 or 3.
void *thread(void *arg)
{
  int num = (int) arg;
  sem_t *wait, *trigger;
  int val, retval;
  struct timespec ts;
  struct timeval tv;

  switch (num)
    {
      case 1:
        wait = &trigger_sem1;
        trigger = &trigger_sem2;
        break;
      case 2:
        wait = &trigger_sem2;
        trigger = &trigger_sem3;
        break;
      case 3:
        wait = &trigger_sem3;
        trigger = &trigger_sem1;
        break;
    }

  while (1)
    {
      // The first thread delays by 40ms to time the whole loop.  
      // This is an external sync in the real app.
      if (num == 1)   
        usleep(40000);

      // print sem value before we wait.  If this is 1, sem_timedwait() will
      // return immediately, otherwise it will block until sem_post() is called on this sem. 
      sem_getvalue(wait, &val);
      printf("sem%d wait sync sem%d. val before %d\n", num, num, val);

          // get current time and add half a second for timeout.
      gettimeofday(&tv, NULL);
      ts.tv_sec = tv.tv_sec;
      ts.tv_nsec = (tv.tv_usec + 500000);    // add half a second
      if (ts.tv_nsec > 1000000)
        {
          ts.tv_sec++;
          ts.tv_nsec -= 1000000;
        }
      ts.tv_nsec *= 1000;    /* convert to nanosecs */

      retval = sem_timedwait(wait, &ts);
      if (retval == -1)
        {
          // timed out.  Print value of sem now.  This should be 0, otherwise sem_timedwait
          // would have woken before timeout (unless the sem_post happened between the 
          // timeout and this call to sem_getvalue).
          sem_getvalue(wait, &val);
          printf("!!!!!!    sem%d sem_timedwait failed: %s, val now %d\n", 
            num, strerror(errno), val);
        }
      else
        printf("sem%d wakeup.\n", num);

        // get value of semaphore to trigger.  If it's 1, don't post as it has already been 
        // triggered and sem_timedwait on this sem *should* not block.
      sem_getvalue(trigger, &val);
      if (val <= 0)
        {
          printf("sem%d send sync sem%d. val before %d\n", num, (num == 3 ? 1 : num+1), val);
          sem_post(trigger);
        }
      else
        printf("!! sem%d not sending sync, val %d\n", num, val);
    }
}



int main(int argc, char *argv[])
{
  pthread_t t1, t2, t3;

   // create semaphores.  val of sem1 is 1 to trigger straight away and start the whole ball rolling.
  if (sem_init(&trigger_sem1, 0, 1) == -1)
    perror("Error creating trigger_listman semaphore");
  if (sem_init(&trigger_sem2, 0, 0) == -1)
    perror("Error creating trigger_comms semaphore");
  if (sem_init(&trigger_sem3, 0, 0) == -1)
    perror("Error creating trigger_vws semaphore");

  pthread_create(&t1, NULL, thread, (void *) 1);
  pthread_create(&t2, NULL, thread, (void *) 2);
  pthread_create(&t3, NULL, thread, (void *) 3);

  pthread_join(t1, NULL);
  pthread_join(t2, NULL);
  pthread_join(t3, NULL);
}

The following output is printed when the program is running correctly (at the start and for a random but long time after). The value of sem1 is always 1 before thread1 waits as it sleeps for 40ms, by which time sem3 has triggered it, so it wakes straight away. The other two threads wait until the semaphore is received from the previous thread.

[...]
sem1 wait sync sem1. val before 1
sem1 wakeup.
sem1 send sync sem2. val before 0
sem2 wakeup.
sem2 send sync sem3. val before 0
sem2 wait sync sem2. val before 0
sem3 wakeup.
sem3 send sync sem1. val before 0
sem3 wait sync sem3. val before 0
sem1 wait sync sem1. val before 1
sem1 wakeup.
sem1 send sync sem2. val before 0
[...]

However, after a few hours, one of the threads begins to timeout. I can see from the output that the semaphore is being triggered, and when I print the value after the timeout, it is 1. So sem_timedwait should have woken up well before the timeout. I would never expect the value of the semaphore to be 1 after the timeout, save for the very rare occasion (almost certainly never but it's possible) when the trigger happens after the timeout but before I call sem_getvalue.

Also, once it begins to fail, every sem_timedwait() on that semaphore also fails in the same way. See the following output, which I've line-numbered:

01  sem3 wait sync sem3. val before 0
02  sem1 wakeup.
03  sem1 send sync sem2. val before 0
04  sem2 wakeup.
05  sem2 send sync sem3. val before 0
06  sem2 wait sync sem2. val before 0
07  sem1 wait sync sem1. val before 0
08  !!!!!!    sem3 sem_timedwait failed: Connection timed out, val now 1
09  sem3 send sync sem1. val before 0
10  sem3 wait sync sem3. val before 1
11  sem3 wakeup.
12  !! sem3 not sending sync, val 1
13  sem3 wait sync sem3. val before 0
14  sem1 wakeup.
[...]

On line 1, thread 3 (which I have confusingly called sem3 in the printf) waits for sem3 to be triggered. On line 5, thread2 calls sem_post for sem3. However, line 8 shows sem3 timing out, but the value of the semaphore is 1. thread3 then triggers sem1 and waits again (10). However, because the value is already 1, it wakes straight away. It doesn't send sem1 again as this has all happened before control is given to thread1, however it then waits again (val is now 0) and sem1 wakes up. This now repeats for ever, sem3 always timing out and showing that the value is 1.

So, my question is why does sem3 timeout, even though the semaphore has been triggered and the value is clearly 1? I would never expect to see line 08 in the output. If it times out (because, say thread 2 has crashed or is taking too long), the value should be 0. And why does it work fine for 3 or 4 hours first before getting into this state?

I have tried a similar test using three separate programs, communicating over shared memory, rather than three threads in the same program. This more closely resembles my real world application. The results and the output were the same. The problem does appear to be in the semaphore (particularly the sem_timedwait call) rather than anything to do with pthreads.

I have also tried shorter and longer delays, as well as removing the delay completely, with similar results to those described above. With no delay at all it can sometimes start to produce the error after minutes rather than hours. This does of course mean that the problem can be reproduced a lot quicker.

This is using Ubuntu 9.4 with kernel 2.6.28. The same procedure has been working properly on Redhat and Fedora, but I'm now trying to port to Ubuntu. I have also tried using Ubuntu 9.10, which made no difference.

Thanks for any advice, Giles

Answer

Jens Gustedt picture Jens Gustedt · Jun 24, 2010

(Sorry to give a second answer but this one would be too messy to clean up just with editing)

The answer is, I think, already in the original post for the question.

So, my question is why does sem3 timeout, even though the semaphore has been triggered and the value is clearly 1? I would never expect to see line 08 in the output. If it times out (because, say thread 2 has crashed or is taking too long), the value should be 0. And why does it work fine for 3 or 4 hours first before getting into this state?

So the scenario is:

  1. thread 2 takes too long
  2. sem3 times out in sem_timedwait
  3. thread 3 is descheduled or whatever it takes it to reach the sem_getvalue
  4. thread 2 wakes up and does its sem_post on sem3
  5. thread 3 issues its sem_getvalue and sees a 1
  6. thread 3 branches into the wrong branch and doesn't do its sem_post on sem1

This race condition is hard to trigger, basically you have to hit the tiny time window where one thread has had a problem in waiting for the semaphore and then reads the semaphore with the sem_getvalue. The occurrence of that condition is much dependent of the environment (type of system, number of cores, load, IO interrupts) so this explains why it only occurs after hours, if not at all.

Having the control flow depend of a sem_getvalue is generally a bad idea. The only atomic non-blocking access to a sem_t is through sem_post and sem_trywait.

So this example code from the question has that race condition. This doesn't mean that the original problem code that gillez had, does indeed have the same race condition. Perhaps the example is just too simplistic, and still shows the same phenomenon for him.

My guess is, in his original problem there was an unprotected sem_wait. That is a sem_wait that is only checked for its return value and not for errno in the event that it fails. EINTRs do occur on sem_wait quite naturally if the process has some IO. You have just do a do - while with check and reset of errno if you encounter a EINTR.