Is DocumentDB slower than SQL at pulling lots of records?

bladefist picture bladefist · Sep 1, 2014 · Viewed 7k times · Source

I was doing some benchmarking, so I had a SQL database with 2500 records. I inserted those records into DocumentDB.

I wrote two lines of code, one with entity framework to pull all 2500 into an array in C#. The next line to pull all 2500 into an array from DocuementDB.

Code used:

var test= await Task<Test>.Run(() =>
              client.CreateDocumentQuery<Test>(collection.DocumentsLink)
              .ToList());

The DocumentDB example took over 20 seconds. The SQL Server line was near instant. The objects are simple DTO with 5 properties, and I did the SQL Query over the internet.

Am I misusing DocumentDB? I thought it was made to pull all your records into memory then join with linq.

Answer

Aravind Krishna R. picture Aravind Krishna R. · Sep 3, 2014

@bladefist, you should be able to achieve much better performance with DocumentDB. For example, take a look at this code stub and output from an Azure VM and DocumentDB account both in West Europe.

Stopwatch watch = new Stopwatch();
for (int i = 0; i < 10; i++)
{
    watch.Start();
    int numDocumentsRead = 0;
    foreach (Document d in client.CreateDocumentQuery(collection.SelfLink, 
        new FeedOptions { MaxItemCount = 1000 }))
    {
        numDocumentsRead++;
    }

    Console.WriteLine("Run {0} - read {1} documents in {2} ms", i, numDocumentsRead, 
        watch.Elapsed.TotalMilliseconds);
    watch.Reset();
}

//Output
Run 0 - read 2500 documents in 426.1359 ms
Run 1 - read 2500 documents in 286.506 ms
Run 2 - read 2500 documents in 227.4451 ms
Run 3 - read 2500 documents in 270.4497 ms
Run 4 - read 2500 documents in 275.7205 ms
Run 5 - read 2500 documents in 281.571 ms
Run 6 - read 2500 documents in 268.9624 ms
Run 7 - read 2500 documents in 275.1513 ms
Run 8 - read 2500 documents in 301.0263 ms
Run 9 - read 2500 documents in 288.1455 ms

Some best practices to follow for performance:

  • Use direct connectivity and TCP protocol
  • Use a large page size (max: 1000) if you’re reading in large batches to minimize the number of round trips
  • To reduce latency, run your client in the same region as your DocumentDB account
  • The provisioned throughput (and storage) of the capacity units that you purchase is spread across the collections. So if you want to measure the throughput you should make sure that your app distributes the workload across all the collections. For instance, if you have purchase 1 CU, you can choose to distribute the all of the throughput to a single collection or across three collections.