We're seeing some pernicious, but rare, deadlock conditions in the Stack Overflow SQL Server 2005 database.
I attached the profiler, set up a trace profile using this excellent article on troubleshooting deadlocks, and captured a bunch of examples. The weird thing is that the deadlocking write is always the same:
UPDATE [dbo].[Posts]
SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0
The other deadlocking statement varies, but it's usually some kind of trivial, simple read of the posts table. This one always gets killed in the deadlock. Here's an example
SELECT
[t0].[Id], [t0].[PostTypeId], [t0].[Score], [t0].[Views], [t0].[AnswerCount],
[t0].[AcceptedAnswerId], [t0].[IsLocked], [t0].[IsLockedEdit], [t0].[ParentId],
[t0].[CurrentRevisionId], [t0].[FirstRevisionId], [t0].[LockedReason],
[t0].[LastActivityDate], [t0].[LastActivityUserId]
FROM [dbo].[Posts] AS [t0]
WHERE [t0].[ParentId] = @p0
To be perfectly clear, we are not seeing write / write deadlocks, but read / write.
We have a mixture of LINQ and parameterized SQL queries at the moment. We have added with (nolock)
to all the SQL queries. This may have helped some. We also had a single (very) poorly-written badge query that I fixed yesterday, which was taking upwards of 20 seconds to run every time, and was running every minute on top of that. I was hoping this was the source of some of the locking problems!
Unfortunately, I got another deadlock error about 2 hours ago. Same exact symptoms, same exact culprit write.
The truly strange thing is that the locking write SQL statement you see above is part of a very specific code path. It's only executed when a new answer is added to a question -- it updates the parent question with the new answer count and last date/user. This is, obviously, not that common relative to the massive number of reads we are doing! As far as I can tell, we're not doing huge numbers of writes anywhere in the app.
I realize that NOLOCK is sort of a giant hammer, but most of the queries we run here don't need to be that accurate. Will you care if your user profile is a few seconds out of date?
Using NOLOCK with Linq is a bit more difficult as Scott Hanselman discusses here.
We are flirting with the idea of using
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
on the base database context so that all our LINQ queries have this set. Without that, we'd have to wrap every LINQ call we make (well, the simple reading ones, which is the vast majority of them) in a 3-4 line transaction code block, which is ugly.
I guess I'm a little frustrated that trivial reads in SQL 2005 can deadlock on writes. I could see write/write deadlocks being a huge issue, but reads? We're not running a banking site here, we don't need perfect accuracy every time.
Ideas? Thoughts?
Are you instantiating a new LINQ to SQL DataContext object for every operation or are you perhaps sharing the same static context for all your calls?
Jeremy, we are sharing one static datacontext in the base Controller for the most part:
private DBContext _db;
/// <summary>
/// Gets the DataContext to be used by a Request's controllers.
/// </summary>
public DBContext DB
{
get
{
if (_db == null)
{
_db = new DBContext() { SessionName = GetType().Name };
//_db.ExecuteCommand("SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED");
}
return _db;
}
}
Do you recommend we create a new context for every Controller, or per Page, or .. more often?
According to MSDN:
http://msdn.microsoft.com/en-us/library/ms191242.aspx
When either the READ COMMITTED SNAPSHOT or ALLOW SNAPSHOT ISOLATION database options are ON, logical copies (versions) are maintained for all data modifications performed in the database. Every time a row is modified by a specific transaction, the instance of the Database Engine stores a version of the previously committed image of the row in tempdb. Each version is marked with the transaction sequence number of the transaction that made the change. The versions of modified rows are chained using a link list. The newest row value is always stored in the current database and chained to the versioned rows stored in tempdb.
For short-running transactions, a version of a modified row may get cached in the buffer pool without getting written into the disk files of the tempdb database. If the need for the versioned row is short-lived, it will simply get dropped from the buffer pool and may not necessarily incur I/O overhead.
There appears to be a slight performance penalty for the extra overhead, but it may be negligible. We should test to make sure.
Try setting this option and REMOVE all NOLOCKs from code queries unless it’s really necessary. NOLOCKs or using global methods in the database context handler to combat database transaction isolation levels are Band-Aids to the problem. NOLOCKS will mask fundamental issues with our data layer and possibly lead to selecting unreliable data, where automatic select / update row versioning appears to be the solution.
ALTER Database [StackOverflow.Beta] SET READ_COMMITTED_SNAPSHOT ON