Horrible performance using SqlCommand Async methods with large data

hcd picture hcd · Feb 23, 2017 · Viewed 13k times · Source

I'm having major SQL performance problems when using async calls. I have created a small case to demonstrate the problem.

I have create a database on a SQL Server 2016 which resides in our LAN (so not a localDB).

In that database, I have a table WorkingCopy with 2 columns:

Id (nvarchar(255, PK))
Value (nvarchar(max))

DDL

CREATE TABLE [dbo].[Workingcopy]
(
    [Id] [nvarchar](255) NOT NULL, 
    [Value] [nvarchar](max) NULL, 

    CONSTRAINT [PK_Workingcopy] 
        PRIMARY KEY CLUSTERED ([Id] ASC)
                    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                          IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                          ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

In that table, I have inserted a single record (id='PerfUnitTest', Value is a 1.5mb string (a zip of a larger JSON dataset)).

Now, if I execute the query in SSMS :

SELECT [Value] 
FROM [Workingcopy] 
WHERE id = 'perfunittest'

I immediately get the result, and I see in SQL Servre Profiler that the execution time was around 20 milliseconds. All normal.

When executing the query from .NET (4.6) code using a plain SqlConnection :

// at this point, the connection is already open
var command = new SqlCommand($"SELECT Value FROM WorkingCopy WHERE Id = @Id", _connection);
command.Parameters.Add("@Id", SqlDbType.NVarChar, 255).Value = key;

string value = command.ExecuteScalar() as string;

The execution time for this is also around 20-30 milliseconds.

But when changing it to async code :

string value = await command.ExecuteScalarAsync() as string;

The execution time is suddenly 1800 ms ! Also in SQL Server Profiler, I see that the query execution duration is more than a second. Although the executed query reported by the profiler is exactly the same as the non-Async version.

But it gets worse. If I play around with the Packet Size in the connection string, I get the following results :

Packet size 32768 : [TIMING]: ExecuteScalarAsync in SqlValueStore -> elapsed time : 450 ms

Packet Size 4096 : [TIMING]: ExecuteScalarAsync in SqlValueStore -> elapsed time : 3667 ms

Packet size 512 : [TIMING]: ExecuteScalarAsync in SqlValueStore -> elapsed time : 30776 ms

30,000 ms!! That's over a 1000x slower than the non-async version. And SQL Server Profiler reports that the query execution took over 10 seconds. That doesn't even explain where the other 20 seconds are gone to!

Then I've switched back to the sync version and also played around with the Packet Size, and although it did impact a little the execution time, it was nowhere as dramatic as with the async version.

As a sidenote, if it put just a small string (< 100bytes) into the value, the async query execution is just as fast as the sync version (result in 1 or 2 ms).

I'm really baffled by this, especially since I'm using the built-in SqlConnection, not even an ORM. Also when searching around, I found nothing which could explain this behavior. Any ideas?

Answer

Luaan picture Luaan · Feb 23, 2017

On a system without significant load, an async call has a slightly bigger overhead. While the I/O operation itself is asynchronous regardless, blocking can be faster than thread-pool task switching.

How much overhead? Let's look at your timing numbers. 30ms for a blocking call, 450ms for an asynchronous call. 32 kiB packet size means you need you need about fifty individual I/O operations. That means we have roughly 8ms of overhead on each packet, which corresponds pretty well with your measurements over different packet sizes. That doesn't sound like overhead just from being asynchronous, even though the asynchronous versions need to do a lot more work than the synchronous. It sounds like the synchronous version is (simplified) 1 request -> 50 responses, while the asynchronous version ends up being 1 request -> 1 response -> 1 request -> 1 response -> ..., paying the cost over and over again.

Going deeper. ExecuteReader works just as well as ExecuteReaderAsync. The next operation is Read followed by a GetFieldValue - and an interesting thing happens there. If either of the two is async, the whole operation is slow. So there's certainly something very different happening once you start making things truly asynchronous - a Read will be fast, and then the async GetFieldValueAsync will be slow, or you can start with the slow ReadAsync, and then both GetFieldValue and GetFieldValueAsync are fast. The first asynchronous read from the stream is slow, and the slowness depends entirely on the size of the whole row. If I add more rows of the same size, reading each row takes the same amount of time as if I only have one row, so it's obvious that the data is still being streamed row by row - it just seems to prefer to read the whole row at once once you start any asynchronous read. If I read the first row asynchronously, and the second synchronously - the second row being read will be fast again.

So we can see that the problem is a big size of an individual row and/or column. It doesn't matter how much data you have in total - reading a million small rows asynchronously is just as fast as synchronously. But add just a single field that's too big to fit in a single packet, and you mysteriously incur a cost at asynchronously reading that data - as if each packet needed a separate request packet, and the server couldn't just send all the data at once. Using CommandBehavior.SequentialAccess does improve the performance as expected, but the massive gap between sync and async still exists.

The best performance I got was when doing the whole thing properly. That means using CommandBehavior.SequentialAccess, as well as streaming the data explicitly:

using (var reader = await cmd.ExecuteReaderAsync(CommandBehavior.SequentialAccess))
{
  while (await reader.ReadAsync())
  {
    var data = await reader.GetTextReader(0).ReadToEndAsync();
  }
}

With this, the difference between sync and async becomes hard to measure, and changing the packet size no longer incurs the ridiculous overhead as before.

If you want good performance in edge cases, make sure to use the best tools available - in this case, stream large column data rather than relying on helpers like ExecuteScalar or GetFieldValue.