DynamoDB query on boolean key

fruvos picture fruvos · Feb 1, 2015 · Viewed 10.1k times · Source

I'm new to DynamoDB (and to noSQL in general) and am struggling a little to get my head round some of the concepts. One thing in particular is giving me some problems, which is around querying a table based on a boolean key.

I realise that I can't created a primary or secondary index on a boolean key, but I can't see how I should ideally index and query a table with the following structure;

reportId: string (uuid)
reportText: string
isActive: boolean
category: string

I would like to be able to complete the following searches:

  1. Access a specific report directly (a primary hash index of reportId)
  2. List reports of a specific category (a primary hash index on category)

These are both straightforward, but I would like to perform two other queries;

  1. List all reports that are marked as isActive = true
  2. List all reports of a specific category that are marked as isActive = true

My first approach would be to create a primary hashkey index on isActive, with a rangekey on category, but I'm only able to choose String, Number of Boolean as the key type.

Storing isActive as a string (saved as 'true' rather than a boolean true) solves the problem, but its horrible using a string for a boolean property.

Am I missing something? Is there a simple way to query the table directly on a boolean value?

Any advice duly appreciated.

Thanks in advance.

Answer

readyornot picture readyornot · Feb 2, 2015

My project includes this particular scenario and I've followed the DynamoDB best practice of using sparse indexes on both Local and Global Secondary Indexes. Here is what I would do with your example:

Table: reportId (string, hash key) || reportText (string) || isActive (string, marked as "x") || category (string)

ActiveReportsIndex (Local Secondary Index): reportID (hash key) || isActive (range key)

ActiveReportsByCategoryIndex (Global Secondary Index): category (hash key) || isActive (range key) || reportId

The idea behind sparse indexes is that only reports marked as isActive: "x" will show up in your indexes, so they should require less storage and processing than your main table. Instead of making the isActive attribute a boolean type, which will always store a true or false value, use use a string like "x" or anything else you want when the report is active and DELETE the attribute completely when the report is not active. Makes sense?

UPDATE: If you want a specific kind of sort when you query (e.g. chronological), use a number (e.g. a unix timestamp) instead of an "x" string.