I wrote a small program to compare the performance of Critical Section vs Mutex in Windows.
On the tests I ran, acquiring Critical Section seems to be Slower :O Can anybody explain why both things take almost the same amount of time, and what is happening internally.
This is the timer I used - http://cplus.about.com/od/howtodothingsi2/a/timing.htm
#include "stdafx.h"
#include<iostream>
#include<vector>
#include "h_timer.h"
#include<WinBase.h>
#include<Windows.h>
#include<stdio.h>
#define MAX_THREADS 2000
//Comment and Uncomment this to enable/disable critialSection / Mutex
#define CRIT 1
using namespace std;
HANDLE Mutex;
CRITICAL_SECTION critSection;
DWORD WINAPI Contention( LPVOID );
int main( void )
{
HANDLE Thread[MAX_THREADS];
DWORD ThreadID;
int i;
#ifdef CRIT
//create a critical section
InitializeCriticalSection(&critSection);
#else
// Create a mutex with no initial owner
Mutex = CreateMutex( NULL, FALSE,NULL);
#endif
// Create worker threads
CStopWatch timer, tempTimer;
timer.startTimer();
for( i=0; i < MAX_THREADS; i++ )
{
Thread[i] = CreateThread( NULL,
0,(LPTHREAD_START_ROUTINE)Contention,NULL,0,&ThreadID);
}
WaitForMultipleObjects(MAX_THREADS, Thread, TRUE, INFINITE);
timer.stopTimer();
cout<<endl<<"Elapsed Time:"<<timer.getElapsedTime();
cin.get();
// Close thread and mutex handles
for( i=0; i < MAX_THREADS; i++ )
CloseHandle(Thread[i]);
CloseHandle(Mutex);
return 0;
}
DWORD WINAPI Contention( LPVOID lpParam )
{
#ifdef CRIT
EnterCriticalSection(&critSection);
//printf("ThreadId: %d\n",GetCurrentThreadId());
//printf("Let's try Again. %d\n\n", GetCurrentThreadId());
LeaveCriticalSection(&critSection);
#else
// lpParam not used in this example
UNREFERENCED_PARAMETER(lpParam);
DWORD dwCount=0, dwWaitResult;
// Request ownership of mutex.
dwWaitResult = WaitForSingleObject(
ghMutex, // handle to mutex
INFINITE); // no time-out interval
dwCount++;
ReleaseMutex(ghMutex);
#endif
return TRUE;
}
For 2000 threads, on a Quad Core HPZ210, both take roughly 1.5 secs.
I think there are two factors:
Mainly - Your program is dominated by thread creation overhead. You are creating and destroying 2000 threads, and only accessing the mutex/CS once per thread. The time spent creating threads swamps the difference in lock/unlock times.
Also - You may not be testing the use case that these locks were optimized for. Try spawning two threads that each try to access the mutex/CS thousands of times.