Azure Searching Metadata in blobs

scottsanpedro picture scottsanpedro · Sep 28, 2017 · Viewed 8.7k times · Source

I am try to find a way to bring back only items in blob storage with metadata that matches a particular piece of data. All fields will have a key called 'FlightNo'.

What I want really want is a way to find all files (listBlobs) that contain a match to the metadata, so one level up, then iterate through that set of data, and find further matches as each file has 5 items of metadata.

Here is my very unfriendly code to date.

 foreach (IListBlobItem item in container.ListBlobs(null, false))
        {
            if (item.GetType() == typeof(CloudBlockBlob))
            {

                CloudBlockBlob blob = (CloudBlockBlob)item;

                blob.FetchAttributes();

                foreach (var metaDataItem in blob.Metadata)
                {
                    dictionary.Add(metaDataItem.Key, metaDataItem.Value);
                }

                if (dictionary.Where(r=>r.Key == "FlightNo" && r.Value == FlightNo).Any())
                {
                    if (dictionary.Where(r => r.Key == "FlightDate" && r.Value == FlightDate).Any())
                    {
                        if (dictionary.Where(r => r.Key == "FromAirport" && r.Value == FromAirport).Any())
                        {
                            if (dictionary.Where(r => r.Key == "ToAirport" && r.Value == ToAirport).Any())
                            {
                                if (dictionary.Where(r => r.Key == "ToAirport" && r.Value == ToAirport).Any())
                                {
                                    retList.Add(new BlobStorage()
                                    {
                                        Filename = blob.Name,
                                        BlobType = blob.BlobType.ToString(),
                                        LastModified = (DateTimeOffset)blob.Properties.LastModified,
                                        ContentType = blob.Properties.ContentType,
                                        Length = blob.Properties.Length,
                                        uri = RemoveSecondary(blob.StorageUri.ToString()),
                                        FlightNo = dictionary.Where(r => r.Key == "FlightNo").Select(r => r.Value).SingleOrDefault(),
                                        Fixture = dictionary.Where(r => r.Key == "FixtureNo").Select(r => r.Value).SingleOrDefault(),
                                        FlightDate = dictionary.Where(r => r.Key == "FlightDate").Select(r => r.Value).SingleOrDefault(),
                                        FromAirport = dictionary.Where(r => r.Key == "FromAirport").Select(r => r.Value).SingleOrDefault(),
                                        ToAirport = dictionary.Where(r => r.Key == "ToAirport").Select(r => r.Value).SingleOrDefault()
                                    });

                                }
                            }
                        }
                    }
                }

                dictionary.Clear();
            }
        }

Thanks. Scott

Answer

Fishcake picture Fishcake · Jun 6, 2018

The accepted answer is highly inefficient, looping through and loading every single Blob and their associated Metadata to check for values wouldn't perform very well with any reasonable volume of data.

It is possible to search Blob meta data using Azure Search. A search index can be created that includes Blobs custom meta data.

The following comprehensive articles explain it all:

Indexing Documents in Azure Blob Storage with Azure Search
Searching Blob storage with Azure Search