The following article by Thomas Marquardt describes how IIS handles ASP.Net requests, the max/min CLR worker threads/Managed IO threads that can be configured to run, the various requests queues involved and their default sizes.
Now as per the article, the following occurs in IIS 6.0:
I want to know what is the "threshold value" at which point ASP.NET considers that the number of requests currently executing it too high and then starts queuing the requests to the global ASP.NET request queue?
I think this threshold will depend upon the configuration of max number of worker threads, but there might be some formula based on which ASP.NET will determine that the number of concurrently executing requests is too high and starts queuing the requests to the ASP.NET global request queue. What might this formula? or is this setting configurable?
1) On IIS 6 and in IIS 7 classic mode, each application (AppDomain) has a queue that it uses to maintain the availability of worker threads. The number of requests in this queue increases if the number of available worker threads falls below the limit specified by httpRuntime minFreeThreads. When the limit specified by httpRuntime appRequestQueueLimit is exceeded, the request is rejected with a 503 status code and the client receives an HttpException with the message "Server too busy." There is also an ASP.NET performance counter, "Requests In Application Queue", that indicates how many requests are in the queue. Yes, the CLR thread pool is the one exposed by the .NET ThreadPool class.
2) The requestQueueLimit is poorly named. It actually limits the maximum number of requests that can be serviced by ASP.NET concurrently. This includes both requests that are queued and requests that are executing. If the "Requests Current" performance counter exceeds requestQueueLimit, new incoming requests will be rejected with a 503 status code.
So essentially requestQueueLimit limits the number of requests that are queued (I am assuming it will sum the number of requests queued in Application queues plus the global ASP.Net request queue plus the number of requests currently executing) and are executing. All though this does not answer the original question, it does provide information about when we might receive a 503 server busy error because of high number of concurrent requests/high latency requests.
(Check update 2)
Update 2
There was a mistake in my part in the understanding. I had mixed up the descriptions for IIS 6 and IIS 7.
Essentially when ASP.NET is hosted on IIS 7.5 and 7.0 in integrated mode, the application-level queues are no longer present, ASP.NET maintains a global request queue.
So IIS 7/7.5 will start to queue requests to the global request queue if the number of executing requests is deemed high. The question applies more to IIS 7/7.5 rather than 6.
As far IIS 6.0 is concerned, there is no global ASP.NET request queue, but the following is true:
1. ASP.NET picks up the requests from a IIS IO Thread and posts "HSE_STATUS_PENDING" to IIS IO Thread
2. The requests is handed over to a CLR Worker thread
3. If the requests are of high latency and all the threads are occupied (the thread count approaches httpRuntime.minFreeThreads), then the requests are posted to the Application level request queue (this queue is per AppDomain)
4. Also ASP.NET checks the number of requests queued and currently executing before accepting a new request. If this number is greater than value specified by processModel.requestQueueLimit then incoming requests will be rejected with 503 server busy error.
This article might help to understand the settings a little better.
minFreeThreads: This setting is used by the worker process to queue all the incoming requests if the number of available threads in the thread pool falls below the value for this setting. This setting effectively limits the number of requests that can run concurrently to maxWorkerThreads minFreeThreads. Set minFreeThreads to 88 * # of CPUs. This limits the number of concurrent requests to 12 (assuming maxWorkerThreads is 100).
Edit:
In this SO post, Thomas provides more detail and examples of request handling in the integrated pipeline. Be sure to read the comments on the answer for additional explanations.
A native callback (in webengine.dll) picks up request on CLR worker thread, we compare maxConcurrentRequestsPerCPU * CPUCount to total active requests. If we've exceeded limit, request is inserted in global queue (native code). Otherwise, it will be executed. If it was queued, it will be dequeued when one of the active requests completes.