Detecting/Diagnosing Thread Starvation

Phil Sandler picture Phil Sandler · Jul 11, 2017 · Viewed 10k times · Source

I am doing some performance/scalability testing of an IIS application that occasionally seems to slow down to a crawl in production. I'm able to reproduce the slowness consistently using NUnit.

CPU and Memory do not spike during the testing, or when the slowness occurs in production. My strong suspicion is that the application is suffering from thread starvation, since it does not appear to be CPU, Memory, I/O, or database access that is causing the bottleneck. I do see signs of what appear to be thread starvation; for example, NLog's async log file writes tend to have long periods of silence followed by bursts of activity with older time stamps (i.e. a lower-priority thread is waiting for threads to free up in order to write).

What steps can I take to definitively determine that the application is indeed thread starved, and (assuming that is the case) pinpoint the exact areas of the system that are causing the problem?

Edit

I neglected to mention that almost all the code is synchronous (it's a legacy system).

Answer

Phil Sandler picture Phil Sandler · Jul 12, 2017

Based on Sinatr's comment, I did some reading on ThreadPool.SetMinThreads and TaskCreationOptions.LongRunning, including answers to When to use TaskCreationOptions.LongRunning?

Setting MinThreads to a higher default value made a huge difference in my case. I created a simple background process to see if Available Threads in the ThreadPool was changing significantly during the course of a test run and exceeding the MinThreads value (it was).

Here's some code I used to diagnose. This is not intended for production use, and the reporting of thread usage shown here would only be interesting as they ramped up initially. Also note that the Timer needs a thread when it elapses, so also needs to wait for an available thread.

Static vars:

    private static Timer _timer;
    private static int _lastActiveThreads;
    private static int _lastAvailableThreads;
    private static int _maxThreads;
    private static int _minThreads;

Run at startup:

    int completionPortThreads;

    ThreadPool.GetMaxThreads(out _maxThreads, out completionPortThreads);
    ThreadPool.GetMinThreads(out _minThreads, out completionPortThreads);

    _timer = new Timer
    {
        AutoReset = true,
        Interval = 500,
    };

    _timer.Elapsed += TimerElasped;
    _timer.Start();

Elapsed method:

    private static void TimerElasped(object sender, ElapsedEventArgs e)
    {
        int minWorkerThreads;
        int availWorkerThreads;
        int completionPortThreads;

        ThreadPool.GetMinThreads(out minWorkerThreads, out completionPortThreads);
        ThreadPool.GetAvailableThreads(out availWorkerThreads, out completionPortThreads);

        var activeThreads = _maxThreads - availWorkerThreads;

        if (availWorkerThreads != _lastAvailableThreads)
        {
            _lastAvailableThreads = availWorkerThreads;
            if (activeThreads > _lastActiveThreads)
            {
                _lastActiveThreads = activeThreads;
                Logger.Log($"+++++ Active Threads is now: {activeThreads}");

                if (activeThreads > _minThreads)
                {
                    var diff = activeThreads - _minThreads;
                    Logger.Log($"+++++ Active threads is now {activeThreads}, which is {diff} more than minThread value of {_minThreads}.  This may be causing delays.");
                }
            }
        }
    }