VC++ 2010: Weird Critical Section error

dario_ramos picture dario_ramos · Aug 19, 2011 · Viewed 7.4k times · Source

My program is randomly crashing in a small scenario I can reproduce, but it happens in mlock.c (which is a VC++ runtime file) from ntdll.dll, and I can't see the stack trace. I do know that it happens in one of my thread functions, though.

This is the mlock.c code where the program crashes:

void __cdecl _unlock (
        int locknum
        )
{
        /*
         * leave the critical section.
         */
        LeaveCriticalSection( _locktable[locknum].lock );
}

The error is "invalid handle specified". If I look at locknum, it's a number larger than _locktable's size, so this makes some sense.

This seems to be related to Critical Section usage. I do use CRITICAL_SECTIONS in my thread, via a CCriticalSection wrapper class and its associated RAII guard, CGuard. Definitions for both here to avoid even more clutter.

This is the thread function that's crashing:

unsigned int __stdcall CPlayBack::timerThread( void * pParams ) {
#ifdef _DEBUG
    DRA::CommonCpp::SetThreadName( -1, "CPlayBack::timerThread" );
#endif
    CPlayBack * pThis = static_cast<CPlayBack*>( pParams );
    bool bContinue = true;
    while( bContinue ) {
        float m_fActualFrameRate = pThis->m_fFrameRate * pThis->m_fFrameRateMultiplier;
        if( m_fActualFrameRate != 0 && pThis->m_bIsPlaying ) {
            bContinue = ( ::WaitForSingleObject( pThis->m_hEndThreadEvent, static_cast<DWORD>( 1000.0f / m_fActualFrameRate ) ) == WAIT_TIMEOUT );
            CImage img;
            if( pThis->m_bIsPlaying && pThis->nextFrame( img ) )
                pThis->sendImage( img );
        }
        else
            bContinue = ( ::WaitForSingleObject( pThis->m_hEndThreadEvent, 10 ) == WAIT_TIMEOUT );
    }
    ::GetErrorLoggerInstance()->Log( LOG_TYPE_NOTE, "CPlayBack", "timerThread", "Exiting thread" );
    return 0;
}

Where does CCriticalSection come in? Every CImage object contains a CCriticalSection object which it uses through a CGuard RAII lock. Moreover, every CImage contains a CSharedMemory object which implements reference counting. To that end, it contains two CCriticalSection's as well, one for the data and one for the reference counter. A good example of these interactions is best seen in the destructors:

CImage::~CImage() {
    CGuard guard(m_csData);
    if( m_pSharedMemory != NULL ) {
        m_pSharedMemory->decrementUse();
        if( !m_pSharedMemory->isBeingUsed() ){
            delete m_pSharedMemory;
            m_pSharedMemory = NULL;
        }
    }
    m_cProperties.ClearMin();
    m_cProperties.ClearMax();
    m_cProperties.ClearMode();
}

CSharedMemory::~CSharedMemory() {
    CGuard guardUse( m_cs );
    if( m_pData && m_bCanDelete ){
        delete []m_pData;
    }
    m_use = 0;
    m_pData = NULL;
}

Anyone bumped into this kind of error? Any suggestion?

Edit: I got to see some call stack: the call comes from ~CSharedMemory. So there must be some race condition there

Edit: More CSharedMemory code here

Answer

John Dibling picture John Dibling · Aug 19, 2011

The "invalid handle specified" return code paints a pretty clear picture that your critical section object has been deallocated; assuming of course that it was allocated properly to begin with.

Your RAII class seems like a likely culprit. If you take a step back and think about it, your RAII class violates the Sepration Of Concerns principle, because it has two jobs:

  1. It provides allocate/destroy semantics for the CRITICAL_SECTION
  2. It provides acquire/release semantics for the CRITICAL_SECTION

Most implementations of a CS wrapper I have seen violate the SoC principle in the same way, but it can be problematic. Especially when you have to start passing around instances of the class in order to get to the acquire/release functionality. Consider a simple, contrived example in psudocode:

void WorkerThreadProc(CCriticalSection cs)
{
  cs.Enter();
  // MAGIC HAPPENS
  cs.Leave();
}

int main()
{
  CCriticalSection my_cs;
  std::vector<NeatStuff> stuff_used_by_multiple_threads;

  // Create 3 threads, passing the entry point "WorkerThreadProc"
  for( int i = 0; i < 3; ++i )
    CreateThread(... &WorkerThreadProc, my_cs);

  // Join the 3 threads...
  wait(); 
}

The problem here is CCriticalSection is passed by value, so the destructor is called 4 times. Each time the destructor is called, the CRITICAL_SECTION is deallocated. The first time works fine, but now it's gone.

You could kludge around this problem by passing references or pointers to the critical section class, but then you muddy the semantic waters with ownership issues. What if the thread that "owns" the crit sec dies before the other threads? You could use a shared_ptr, but now nobody really "owns" the critical section, and you have given up a little control in on area in order to gain a little in another area.

The true "fix" for this problem is to seperate concerns. Have one class for allocation & deallocation:

class CCriticalSection : public CRITICAL_SECTION
{
public:
  CCriticalSection(){ InitializeCriticalSection(this); }
  ~CCriticalSection() { DestroyCriticalSection(this); }
};

...and another to handle locking & unlocking...

class CSLock
{
public:
  CSLock(CRITICAL_SECTION& cs) : cs_(cs) { EnterCriticalSection(&cs_); }
  ~CSLock() { LeaveCriticalSection(&cs_); }
private: 
  CRITICAL_SECTION& cs_;
};

Now you can pass around raw pointers or references to a single CCriticalSection object, possibly const, and have the worker threads instantiate their own CSLocks on it. The CSLock is owned by the thread that created it, which is as it should be, but ownership of the CCriticalSection is clearly retained by some controlling thread; also a good thing.