I have a custom stream that is used to perform write operations directly into the page cloud blob.
public sealed class WindowsAzureCloudPageBlobStream : Stream
{
// 4 MB is the top most limit for page blob write operations
public const int MaxPageWriteCapacity = 4 * 1024 * 1024;
// Every operation on a page blob has to manipulate a value which is rounded up to 512 bytes
private const int PageBlobPageAdjustmentSize = 512;
private CloudPageBlob _pageBlob;
public override void Write(byte[] buffer, int offset, int count)
{
var additionalOffset = 0;
var bytesToWriteTotal = count;
List<Task> list = new List<Task>();
while (bytesToWriteTotal > 0)
{
var bytesToWriteTotalAdjusted = RoundUpToPageBlobSize(bytesToWriteTotal);
// Azure does not allow us to write as many bytes as we want
// Max allowed size per write is 4MB
var bytesToWriteNow = Math.Min((int)bytesToWriteTotalAdjusted, MaxPageWriteCapacity);
var adjustmentBuffer = new byte[bytesToWriteNow];
...
var memoryStream = new MemoryStream(adjustmentBuffer, 0, bytesToWriteNow, false, false);
var task = _pageBlob.WritePagesAsync(memoryStream, Position, null);
list.Add(task);
}
Task.WaitAll(list.ToArray());
}
private static long RoundUpToPageBlobSize(long size)
{
return (size + PageBlobPageAdjustmentSize - 1) & ~(PageBlobPageAdjustmentSize - 1);
}
I have a low performance of Write()
. For example:
Stopwatch s = new Stopwatch();
s.Start();
using (var memoryStream = new MemoryStream(adjustmentBuffer, 0, bytesToWriteNow, false, false))
{
_pageBlob.WritePages(memoryStream, Position);
}
s.Stop();
Console.WriteLine(s.Elapsed); => 00:00:01.52 == Average speed 2.4 MB/s
How can I improve my algorithm?
How to use Parallel.ForEach
to speedup the process?
Why just only 2.5 MB/sec, but not a 60MB/sec as in official site or http://blogs.microsoft.co.il/applisec/2012/01/05/windows-azure-benchmarks-part-2-blob-write-throughput/
Like you, I had a lot of performance issues with page blobs as well - even though they were not this severe. It seems like you've done your homework, and I can see that you're doing everything by the book.
A few things to check:
ServicePointManager.DefaultConnectionLimit
. Task
s / async
/ await
, especially if you have a lot to do).Oh and one more thing:
The main reason you're access times are slow is because you're doing everything synchronously. The benchmarks at microsoft access the blobs in multiple threads, which will give more throughput.
Now, Azure also knows that performance is an issue, which is why they've attempted to mitigate the problem by backing storage with local caching. What basically happens here is that they write the data local (f.ex. in a file), then cut the tasks into pieces and then use multiple threads to write everything to blob storage. The Data Storage Movement library is one such libraries. However, when using them you should always keep in mind that these have different durability constraints (it's like enabling 'write caching' on your local PC) and might break the way you intended to setup your distributed system (if you read & write the same storage from multiple VM's).
Why...
You've asked for the 'why'. In order to understand why blob storage is slow, you need to understand how it works. First I'd like to point out that there is this presentation from Microsoft Azure that explains how Azure storage actually works.
First thing that you should realize is that Azure storage is backed by a distributed set of (spinning) disks. Because of the durability and consistency constraints, they also ensure that there's a 'majority vote' that the data is written to stable storage. For performance, several levels of the system will have caches, which will mostly be read caches (again, due to the durability constraints).
Now, the Azure team doesn't publish everything. Fortunately for me, 5 years ago my previous company created a similar system on a smaller scale. We had similar performance problems like Azure, and the system was quite similar to the presentation that I've linked above. As such, I think I can explain and speculate a bit on where the bottlenecks are. For clarity I'll mark sections as speculation where I think this is appropriate.
If you write a page to blob storage, you actually setup a series of TCP/IP connections, store the page at multiple locations, and when a majority vote is received you give an 'ok' back to the client. Now, there are actually a few bottlenecks in this system:
Number (1), (2) and (3) here are quite well known. Number (4) here is actually the result of (1) and (2). Note that you cannot just throw an infinite number of requests to spinning disks; well... actually you can, but then the system will come to a grinding halt. So, in order to solve that, disk seeks from different clients are usually scheduled in such a way that you only seek if you know that you can also write everything (to minimize the expensive seeks). However, there's an issue here: if you want to push throughput, you need to start seeking before you have all the data - and if you're not getting the data fast enough, other requests have to wait longer. Herein also lies a dilemma: you can either optimize for this (this can sometimes hurt per-client throughput and stall everyone else, especially with mixed workloads) or buffer everything and then seek & write everything at once (this is easier, but adds some latency for everyone). Because of the vast amount of clients that Azure serves, I suspect they chose the last approach - which adds more latency to a complete write cycle.
Regardless of that, most of the time will probably be spent by (1) and (2) though. The actual data bursts and data writes are then quite fast. To give you a rough estimation: here are some commonly used timings.
So, that leaves us with 1 question: why is writing stuff in multiple threads so much faster?
The reason for that is actually very simple: if we write stuff in multiple threads, there's a high chance that we store the actual data on different servers. This means that we can shift our bottleneck from "seek + network setup latency" to "throughput". And as long as our client VM can handle it, it's very likely that the infrastructure can handle it as well.