Is Critical Section always faster?

aJ. picture aJ. · May 12, 2009 · Viewed 18.3k times · Source

I was debugging a multi-threaded application and found the internal structure of CRITICAL_SECTION. I found data member LockSemaphore of CRITICAL_SECTION an interesting one.

It looks like LockSemaphore is an auto-reset event (not a semaphore as the name suggests) and operating system creates this event silently when first time a thread waits on Critcal Section which is locked by some other thread.

Now, I am wondering is Critical Section always faster? Event is a kernel object and each Critical section object is associated with event object then how Critical Section can be faster compared to other kernel objects like Mutex? Also, how does internal event object actually affects the performance of Critical section ?

Here is the structure of the CRITICAL_SECTION:

struct RTL_CRITICAL_SECTION
{
    PRTL_CRITICAL_SECTION_DEBUG DebugInfo;
    LONG LockCount;
    LONG RecursionCount;
    HANDLE OwningThread;
    HANDLE LockSemaphore;
    ULONG_PTR SpinCount;
};

Answer

ChrisW picture ChrisW · May 12, 2009

When they say that a critical section is "fast", they mean "it's cheap to acquire one when it isn't already locked by another thread".

[Note that if it is already locked by another thread, then it doesn't matter nearly so much how fast it is.]

The reason why it's fast is because, before going into the kernel, it uses the equivalent of InterlockedIncrement on one of those LONG field (perhaps on the the LockCount field) and if it succeeds then it considers the lock aquired without having gone into the kernel.

The InterlockedIncrement API is I think implemented in user mode as a "LOCK INC" opcode ... in other words you can acquire an uncontested critical section without doing any ring transition into the kernel at all.