SQL Server lock/hang issue

Matt Roberts picture Matt Roberts · Jun 10, 2010 · Viewed 17.1k times · Source

I'm using SQL Server 2008 on Windows Server 2008 R2, all sp'd up.

I'm getting occasional issues with SQL Server hanging with the CPU usage on 100% on our live server. It seems all the wait time on SQL Sever when this happens is given to SOS_SCHEDULER_YIELD.

Here is the Stored Proc that causes the hang. I've added the "WITH (NOLOCK)" in an attempt to fix what seems to be a locking issue.

ALTER PROCEDURE [dbo].[MostPopularRead]
AS
BEGIN
SET NOCOUNT ON;

SELECT 
    c.ForeignId , ct.ContentSource as ContentSource
    , sum(ch.HitCount * hw.Weight) as Popularity
    , (sum(ch.HitCount * hw.Weight) * 100) / @Total as Percent
    , @Total as TotalHits
from 
    ContentHit ch WITH (NOLOCK)
    join [Content] c WITH (NOLOCK) on ch.ContentId = c.ContentId
    join HitWeight hw WITH (NOLOCK) on ch.HitWeightId = hw.HitWeightId
    join ContentType ct WITH (NOLOCK) on c.ContentTypeId = ct.ContentTypeId
where 
    ch.CreatedDate between @Then and @Now
group by
    c.ForeignId , ct.ContentSource
order by
    sum(ch.HitCount * hw.HitWeightMultiplier) desc
END

The stored proc reads from the table "ContentHit", which is a table that tracks when content on the site is clicked (it gets hit quite frequently - anything from 4 to 20 hits a minute). So its pretty clear that this table is the source of the problem. There is a stored proc that is called to add hit tracks to the ContentHit table, its pretty trivial, it just builds up a string from the params passed in, which involves a few selects from some lookup tables, followed by the main insert:

BEGIN TRAN
insert into [ContentHit] 
    (ContentId, HitCount, HitWeightId, ContentHitComment)
values
    (@ContentId, isnull(@HitCount,1), isnull(@HitWeightId,1), @ContentHitComment)
COMMIT TRAN

The ContentHit table has a clustered index on its ID column, and I've added another index on CreatedDate since that is used in the select.

When I profile the issue, I see the Stored proc executes for exactly 30 seconds, then the SQL timeout exception occurs. If it makes a difference the web application using it is ASP.NET, and I'm using Subsonic (3) to execute these stored procs.

Can someone please advise how best I can solve this problem? I don't care about reading dirty data...

EDIT: The MostPopularRead stored proc is called very infrequently - its called on the home page of the site, but the results are cached for a day. The pattern of events that I am seeing is when I clear the cache, multiple requests come in for the home site, and they all hit the stored proc because it hasn't yet been cached. SQL Server then maxes out, and can only be resolved by restarting the sql server process. When I do this, usually the proc will execute OK (in about 200 ms) and put the data back in the cache.

EDIT 2: I've checked the execution plan, and the query looks quite sound. As I said earlier when it does run it only takes around 200ms to execute. I've added MAXDOP 1 to the select statement to force it to use only one CPU core, but I still see the issue. When I look at the wait times I see that XE_DISPATCHER_WAIT, ONDEMAND_TASK_QUEUE, BROKER_TRANSMITTER, KSOURCE_WAKEUP and BROKER_EVENTHANDLER are taking up a massive amount of wait time.

EDIT 3: I previously thought that this was related to Subsonic, our ORM, but having switched to ADO.NET, the erros is still live.

Answer

VladV picture VladV · Jun 10, 2010

The issue is likely concurrency, not locking. SOS_SCHEDULER_YIELD occurs when a task voluntarily yields the scheduler for other tasks to execute. During this wait the task is waiting for its quantum to be renewed.

How often is [MostPopularRead] SP called and how long does it take to execute? The aggregation in your query might be rather CPU-intensive, especially if there are lots of data and/or ineffective indexes. So, you might end up with high CPU pressure - basically, a demand for CPU time is too high.

I'd consider the following:

  1. Check what other queries are executing while CPU is 100% busy? Look at sys.dm_os_waiting_tasks, sys.dm_os_tasks, sys.dm_exec_requests.

  2. Look at the query plan of [MostPopularRead], try to optimize the query. Quite often an ineffective query is the root cause of a performance problem, and query optimization is much more straightforward than other performance improvement techniques.

  3. If the query plan is parallel and the query is often called by multiple clients simultaneously, forcing a single-thread plan with MAXDOP=1 hint might help (abundant use of parallel plans is usually indicated by SOS_SCHEDULER_YIELD and CXPACKET waits).

Also, have a look at this paper: Performance tuning with wait statistics. It gives a pretty good summary of different wait types and their impact on performance.

P.S. It is easier to use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED before a query instead of adding (nolock) to each table.