I am trying to create a table to store invoice line items in DynamoDB. Let's say the item is defined by CompanyCode
, InvoiceNumber
and LineItemId
, amount and other line item details.
A unique item is defined by the combination of the first 3 attributes. Any 2 of those attributes can be same for the different items. What should I select as the Hash Attribute and the Range Attribute?
For efficiency I would propose totally different design. With NoSQL databases (and DynamoDB is not different) we always need to consider the access patterns first. Also, if possible we should strive to fit all our data within same table and several indexes. From what we have from OP and his comments, these are the two access patterns:
We now wonder what is a good Primary Key? Translates to question what is a good Partition Key (PK) and what is a good Sort Key (SK) and which secondary indexes do we need to create and of what kind (local or global)? Some reminders:
KeyConditionExpression
which provides you with set of operators for sorting and everything in between (one of them being function begins_with (a, substr)
)FilterExpression
if you need to further refine the Query results (filter on the projected attributes)It is obvious that we are dealing with multiple entities that need to be modeled and fit into the same table. To satisfy condition of Partition Key being unique on the table, CompanyCode
comes as a natural Partition Key - so I would ensure that is unique. If not then you need to ask yourself how can you model the second access pattern?
Assuming we have established uniqueness on the CompanyCode
let's simplify and say that it comes in the form of an e-mail (or could be domain or just a code, but I will use email for demonstration).
I propose design as in the image below:
CompanyCode
and SK being InvoiceNumber
can store all attributes about that invoice for that company.Customer
which allows me to store all attributes about the company. InvoiceNumber
) and my GSI1SK is my tables PK (CompanyCode
).LineItemId
and SK being CompanyCode
(still unique)InvoiceNumber
and my GSI1SK is LineItemId
which is tables PK so its same as for Invoice entity items.Now the access patterns supported with this:
CompanyCode=X
and use KeyConditionExpression
with =
operator on the Sort Key InvoiceNumber
. If I want to get all the items tied to that invoice, I will project Items
attribute using ProjectionExpression
. BatchGetItem
API call (using my unique composite key LineItemId+CompanyCode
) on table to get all items belonging to that particular invoice of that particular customer. (this comes with some constraints of BatchGetItem API)CompanyCode=X
on PK and use KeyConditionExpression
on the SK with begins_with (a, substr)
function/operator to get only invoices for company X and not the metadata about that company. That will give me all invoices for given company/customer.InvoiceNumber
I can easily select all the line items that belong to that particular invoice. REMEMBER: The key values in a global secondary index do not need to be unique - so in my GSI1 I could have had easily invoice_1 -> (item_1, item_2) and then another invoice_1 -> (item_1,item_2) but the difference between two items in GSI would be in the SK (it would be associated with different CompanyCode
(but for demonstration purposes I used invoice_1 and invoice_2).