I have a problem in a production service which contains a "watchdog" timer used to check whether the main processing job has become frozen (this is related to a COM interop problem which unfortunately can't be reproduced in test).
Here's how it currently works:
ManualResetEvent
, processes a single item (this shouldn't take long), then sets the event. It then continues to process any remaining items.WaitOne(TimeSpan.FromMinutes(5))
on this event. If the result is false, the service is restarted.The cause appears to be that when multiple items await processing, the time between the Set()
after the first item is processed, and the Reset()
before the second item is processed is too brief, and WaitOne()
doesn't appear to recognise that the event has been set.
My understanding of WaitOne()
is that the blocked thread is guaranteed to receive a signal when Set()
is called, but I assume I'm missing something important.
Note that if I allow a context switch by calling Thread.Sleep(0)
after calling Set()
, WaitOne()
never fails.
Included below is a sample which produces the same behaviour as my production code. WaitOne()
sometimes waits for 5 seconds and fails, even though Set()
is being called every 800 milliseconds.
private static ManualResetEvent _handle;
private static void Main(string[] args)
{
_handle = new ManualResetEvent(true);
((Action) PeriodicWait).BeginInvoke(null, null);
((Action) PeriodicSignal).BeginInvoke(null, null);
Console.ReadLine();
}
private static void PeriodicWait()
{
Stopwatch stopwatch = new Stopwatch();
while (true)
{
stopwatch.Restart();
bool result = _handle.WaitOne(5000, false);
stopwatch.Stop();
Console.WriteLine("After WaitOne: {0}. Waited for {1}ms", result ? "success" : "failure",
stopwatch.ElapsedMilliseconds);
SpinWait.SpinUntil(() => false, 1000);
}
}
private static void PeriodicSignal()
{
while (true)
{
_handle.Reset();
Console.WriteLine("After Reset");
SpinWait.SpinUntil(() => false, 800);
_handle.Set();
// Uncommenting either of the lines below prevents the problem
//Console.WriteLine("After Set");
//Thread.Sleep(0);
}
}
The Question
While I understand that calling Set()
closely followed by Reset()
doesn't guarantee that all blocked threads will resume, is it also not guaranteed that any waiting threads will be released?
No, this is fundamentally broken code. There are only reasonable odds that the WaitOne() will complete when you keep the MRE set for such a short amount of time. Windows favors releasing a thread that's blocked on an event. But this will drastically fail when the thread isn't waiting. Or the scheduler picks another thread instead, one that runs with a higher priority and also got unblocked. Could be a kernel thread for example. MRE doesn't keep a "memory" of having been signaled and not yet waited on.
Neither Sleep(0) or Sleep(1) are good enough to guarantee that the wait is going to complete, there's no reasonable upper bound on how often the waiting thread could be bypassed by the scheduler. Although you probably ought to shut down the program when it takes longer than 10 seconds ;)
You'll need to do this differently. A simple way is to rely on the worker to eventually set the event. So reset it before you start waiting:
private static void PeriodicWait() {
Stopwatch stopwatch = new Stopwatch();
while (true) {
stopwatch.Restart();
_handle.Reset();
bool result = _handle.WaitOne(5000);
stopwatch.Stop();
Console.WriteLine("After WaitOne: {0}. Waited for {1}ms", result ? "success" : "failure",
stopwatch.ElapsedMilliseconds);
}
}
private static void PeriodicSignal() {
while (true) {
_handle.Set();
Thread.Sleep(800); // Simulate work
}
}