CloudBlockBlob.DownloadToStream vs DownloadRangeToStream

Jaked222 picture Jaked222 · Jan 23, 2017 · Viewed 8.4k times · Source

Trying to use the ASP.NET azure SDK for downloading images from blob storage..

I read in another post that DownloadToStream does break blobs up into smaller pieces and downloads them in parallel in order to increase performance. I believe this is what DownloadRangeToStream is for.

I have not been able to find any documentation or code confirming this statement about DownloadToStream, and am skeptical because it has the same runtime as just downloading straight from the blob url (.5-3s per download). Here is the code for both my download methods, giving about the same performance.

Using CloudBlockBlob.DownloadToStream:

private Bitmap DownloadFromBlob(String set) {

    CloudStorageAccount storageAccount = CloudStorageAccount.Parse( CloudConfigurationManager.GetSetting("StorageConnectionString"));

    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

    CloudBlobContainer container = blobClient.GetContainerReference("templates");

    CloudBlockBlob blockBlob = container.GetBlockBlobReference(set + ".png");

    using (var memoryStream = new MemoryStream()) {
        blockBlob.DownloadToStream(memoryStream);

        return (memoryStream == null) ? null : (Bitmap)Image.FromStream(memoryStream);
    }
}

Using Image.FromStream:

private Bitmap DownloadImageFromUrl(string url) {
    try {
        using (WebClient client = new WebClient()) {
            byte[] data = client.DownloadData(url);
            using (MemoryStream mem = (data == null) ? null : new MemoryStream(data)) {
                return (data == null || mem == null) ? null : (Bitmap)Image.FromStream(mem);
            }
        }
    } catch (WebException e) {
        return null;
    }
}

I am trying to increase the download time of images that range from .5-12 MB. I tried to implement my own DownloadRangeToStream method for these images, the code for that is below. Do I need to do this or does DownloadToStream do it for me already? This method yields the same runtime as the DownloadFromBlob method above..

Using downloadRangeToStream:

private Image getImageFromStream(string set)
    {
        CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
        CloudConfigurationManager.GetSetting("StorageConnectionString"));

        CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

        CloudBlobContainer container = blobClient.GetContainerReference("templates");

        CloudBlockBlob blockBlob = container.GetBlockBlobReference(set + ".png");

        using (MemoryStream ms = new MemoryStream())
        {

            ParallelDownloadBlob(ms, blockBlob);
            return (ms == null) ? null : Image.FromStream(ms);
        }
    }
private static void ParallelDownloadBlob(Stream outPutStream, CloudBlockBlob blob)
    {
        blob.FetchAttributes();
        int bufferLength = 1 * 1024 * 1024;//1 MB chunk
        long blobRemainingLength = blob.Properties.Length;
        Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
        long offset = 0;
        while (blobRemainingLength > 0)
        {
            long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
            queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
            offset += chunkLength;
            blobRemainingLength -= chunkLength;
        }
        Parallel.ForEach(queues,
            new ParallelOptions()
            {
        //Gets or sets the maximum number of concurrent tasks
        MaxDegreeOfParallelism = 10
            }, (queue) =>
            {
                using (var ms = new MemoryStream())
                {
                    blob.DownloadRangeToStream(ms, queue.Key, queue.Value);
                    lock (outPutStream)
                    {
                        outPutStream.Position = queue.Key;
                        var bytes = ms.ToArray();
                        outPutStream.Write(bytes, 0, bytes.Length);
                    }
                }
            });
    }

Answer

Bruce Chen picture Bruce Chen · Jan 24, 2017

Per my understanding, both CloudBlockBlob.DownloadToStream and Image.FromStream would only send a request to download the stream, you could leverage Fiddler to capture the traffic as follows:

When using DownloadRangeToStream, you could break your blob up into smaller pieces and download them in parallel by yourself in order to increase performance. Here is my code snippet, you could refer to it.

private static void ParallelDownloadBlob(Stream outPutStream, CloudBlockBlob blob)
{
    blob.FetchAttributes();
    int bufferLength = 1 * 1024 * 1024;//1 MB chunk
    long blobRemainingLength = blob.Properties.Length;
    Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
    long offset = 0;
    while (blobRemainingLength > 0)
    {
        long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
        queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
        offset += chunkLength;
        blobRemainingLength -= chunkLength;
    }
    Parallel.ForEach(queues,
        new ParallelOptions()
        {   
            //Gets or sets the maximum number of concurrent tasks
            MaxDegreeOfParallelism = 10
        }, (queue) =>
            {
                using (var ms = new MemoryStream())
                {
                    blob.DownloadRangeToStream(ms, queue.Key, queue.Value);
                    lock (outPutStream)
                    {
                        outPutStream.Position = queue.Key;
                        var bytes = ms.ToArray();
                        outPutStream.Write(bytes, 0, bytes.Length);
                    }
                }
            });
}

Result:

Additionally, there are some blogs about upload/download blob in parallel, you could refer to them (blog1 and blog2).