Batch Insert in Azure storage table

Sushant Baweja picture Sushant Baweja · Oct 11, 2018 · Viewed 7.7k times · Source

I am new to using azure storage table. I was trying to insert my entities in batch but I found that you cannot do batch operation having different partition key.

Is there some way I can do that there are about 10,000 - 20,000 of file details which I want to insert in a table.

Here is what I have tried so far :

public class Manifest:TableEntity
{
    private string name;
    private string extension;
    private string filePath;
    private string relativePath;
    private string mD5HashCode;
    private string lastModifiedDate;

    public void AssignRowKey()
    {
        this.RowKey = relativePath.ToString();
    }
    public void AssignPartitionKey()
    {
        this.PartitionKey = mD5HashCode;
    }
    public string Name { get { return name; } set { name = value; } }
    public string Extension { get { return extension; } set { extension = value; } }
    public string FilePath { get { return filePath; } set { filePath = value; } }
    public string RelativePath { get { return relativePath; } set { relativePath = value; } }
    public string MD5HashCode { get { return mD5HashCode; } set { mD5HashCode = value; } }
    public string LastModifiedDate { get { return lastModifiedDate; } set { lastModifiedDate = value; } }

}

My method this is in different class:

static async Task BatchInsert(CloudTable table, IEnumerable<FileDetails> files)
    {
        int rowOffset = 0;

        var tasks = new List<Task>();

        while (rowOffset < files.Count())
        {
            // next batch
            var rows = files.Skip(rowOffset).Take(100).ToList();

            rowOffset += rows.Count;                

            var task = Task.Factory.StartNew(() =>
            {                  

                var batch = new TableBatchOperation();

                foreach (var row in rows)
                {
                    Manifest manifestEntity = new Manifest
                    {
                        Name = row.Name,
                        Extension = row.Extension,
                        FilePath = row.FilePath,
                        RelativePath = row.RelativePath.Replace('\\', '+'),
                        MD5HashCode = row.Md5HashCode,
                        LastModifiedDate = row.LastModifiedDate.ToString()
                    };
                    manifestEntity.AssignPartitionKey();                        
                    manifestEntity.AssignRowKey();
                    batch.InsertOrReplace(manifestEntity);
                }

                // submit
                table.ExecuteBatch(batch);

            });

            tasks.Add(task);
        }

         await Task.WhenAll(tasks);
}

Answer

Joey Cai picture Joey Cai · Oct 11, 2018

If you want to use batch operation, entities in the batch must have the same PartitionKey. Unfortunately there's no other option but to save them individually in your case.

The reason the partition key even exists is that Azure can distribute data across machines with no coordination between partitions. The system is designed such that different partitions cannot be used in the same transaction or operation.

You could upvote this issue to advance the realization of this function.