Fastest way of querying for latest items in a Azure table?

user152949 picture user152949 · Sep 7, 2011 · Viewed 8.4k times · Source

I have a Azure table where customers post messages, there may be millions of messages in a single table. I want to find the fastest way of getting the messages posted within the last 10 minutes (which is how often I refresh the web page). Since only the partition key is indexed I have played with the idea of using the date & time the message was posted as a partition key, for example a string as a ISO8601 date format like "2009-06-15T13:45:30.0900000"

Example pseudo code:

var message = "Hello word!";
var messagePartitionKey = DateTime.Now.ToString("o");
var messageEntity = new MessageEntity(messagePartitionKey, message);
dataSource.Insert(messageEntity);

, and then query for the messages posted within the last 10 minutes like this (untested pseudo code again):

// Get the date and time 10 minutes ago
var tenMinutesAgo = DateTime.Now.Subtract(new TimeSpan(0, 10, 0)).ToString("o");

// Query for the latest messages
var latestMessages = (from t in
   context.Messages
   where t.PartitionKey.CompareTo(tenMinutesAgo) <= 0
   select t
   )

But will this be taken well by the index? Or will it cause a full table scan? Anyone have a better idea of doing this? I know there is a timestamp on each table item, but it is not indexed so it will be too slow for my purpose.

Answer

knightpfhor picture knightpfhor · Sep 8, 2011

I think you've got the right basic idea. The query you've designed should be about as efficient as you could hope for. But there are some improvements I could offer.

Rather than using DateTime.Now, use Date.UtcNow. From what I understand instances are set to use Utc time as their base anyway, but this just makes sure you're comparing apples with apples and you can reliable convert the time back into whatever timezone you want when displaying them.

Rather than storing the time as .ToString("o") turn the time into ticks and store that, you'll end up with less formatting problems (sometimes you'll get the timezone specification at the end, sometimes not). Also if you always want to see these messages sorted from most recent to oldest you can subtract the number of ticks from the max number of ticks e.g.

var messagePartitionKey = (DateTime.MaxValue.Ticks - _contactDate.Ticks).ToString("d19");

It would also be a good idea to specify a row key. While it is highly unlikely that two messages will be posted with exactly the same time, it's not impossible. If you don't have an obvious row key, then just set it to be a Guid.