Currently, I see that when I use versioning in DynamoDB, it changes the version number but the new entry will replace the old entry; ie:
old
{ object:one, name:"hey", version:1}
new
{ object:one, name:"ho", version:2}
What I want is to have BOTH entries in the db; ie:
old
{ object:one, name:"hey", version:1 }
new
{ object:one, name:"hey", version:1}
{ object:one, name:"ho", version:2}
Any way to accomplish this?
I've been experimenting and calculating what's most efficient in terms of read/write units and cost, considering race conditions where updates occur while a version is being logged, and avoiding of duplication of data. I've narrowed down a couple of possible solutions. You'd have to consider your best variation.
The basic concepts revolve around consider version 0
as the latest version. Also, we will use a revisions
key that will list how many revisions exist before this item, but also will be used to determine the current version of the item (version = revisions + 1
). Being able to calculate how versions exists is a requirement and, in my opinion, revisions
fulfills that need as well as a value that can be presented to the user.
So the first row will be created with version: 0
and revisions: 0
. While this is technically the first version (v1), we don't apply a version number until it's archived. When this row changes, version
stays at 0
, which still denotes latest, and revisions
is incremented to 1
. A new row is created with all the previous values, with the exception that now that row denotes version: 1
.
To summarize:
On item creation:
revisions: 0
and version 0
On item update or overwrite:
revisions
version: 0
to the new version which can easily be calculated as version: revisions + 1
.Here's what converting to convertion would look like on a table with just primary key:
Primary Key: id
id color
9501 violet
9502 cyan
9503 magenta
Primary Key: id+version
id version revisions color
9501 0 6 violet
9501 1 0 red
9501 2 1 orange
9501 3 2 yellow
9501 4 3 green
9501 5 4 blue
9501 6 5 indigo
Here's converting a table that already uses a sort key:
Primary key: id+date
id date color
9501 2018-01 violet
9501 2018-02 cyan
9501 2018-03 black
Primary key: id+date_ver
id date_ver revisions color
9501 2018-01__v0 6 violet
9501 2018-01__v1 0 red
9501 2018-01__v2 1 orange
9501 2018-01__v3 2 yellow
9501 2018-01__v4 3 green
9501 2018-01__v5 4 blue
9501 2018-01__v6 5 indigo
Alternative #2:
id date_ver revisions color
9501 2018-01 6 violet
9501 2018-01__v1 0 red
9501 2018-01__v2 1 orange
9501 2018-01__v3 2 yellow
9501 2018-01__v4 3 green
9501 2018-01__v5 4 blue
9501 2018-01__v6 5 indigo
We actually have the option of either putting previous versions in the same table or separate them into their own table. Both options have their different advantages and disadvantages.
number
or appended to an existing sort key as a string
Advantages:
Disadvantages:
revision
key to both tablesversion
. The primary table would always have version: 0
. Use of this key on the primary table isn't mandatory.Advantages:
get
requests do not change.Disadvantages:
Regardless of how you decide to partition the data, now we have to decide how create the revision rows. Here are a couple of different methods:
Summary: Get the current version of the row. Perform both an update on the current row and insert the previous version with one transaction.
To avoid race conditions, we need to write both the update and insert in the same operation using TransactWriteItems
. Also, we need to make sure that the version we are updating is the right version by the time the request reaches database server. We achieve this by either one of two checks, or even both:
Update
command in TransactItems
, the ConditionExpression
must check that the revision
in the row to be updated matches the revision
in the object we performed a Get
on before.Put
command in TransactItems
, the ConditionExpression
checks to make sure the row does not already exist.Cost
Notes:
Summary: Get and store current row. When overwriting or updating a row, check against current revision and increment revisions
. Insert previously stored row with version number.
Perform an update
with
{
UpdateExpression: 'SET revisions = :newRevisionCount',
ExpressionAttributeValues: {
':newRevisionCount': previousRow.revisions + 1,
':expectedRevisionCount': previousRow.revisions,
},
ConditionExpression: 'revisions = :expectedRevisionCount',
}
We can use the same ConditionExpression
with put
when overwriting a previously existing row.
In the response, we are watching for ConditionalCheckFailedException
. If this is returned, that means the revision was already changed by another process and we need to repeat the process from the beginning or abort entirely. If there are no exceptions, then we can insert the previous stored row after update the value on your version attribute as appropriate (numeric or string).
Cost
Summary: Perform a "blind" update on the v0 row while incrementing revisions
and requesting the old attributes. Use the return value to create a new row with the version number.
Perform an update-item
with
{
UpdateExpression: 'ADD revisions :revisionIncrement',
ExpressionAttributeValues: {
':revisionIncrement': 1,
},
ReturnValues: 'ALL_OLD',
}
The ADD
action will automatically create revisions
if it doesn't exist and will consider it 0
. One nice benefit of ReturnValues is:
There is no additional cost associated with requesting a return value aside from the small network and processing overhead of receiving a larger response. No read capacity units are consumed.
In the update response, the Attributes
value will be the data from the old record. The version of this record is the value of Attributes.revisions + 1
. Update the value on your version attribute as appropriate (numeric or string).
Now you can insert this record into your target table.
Cost
Notes:
Attributes
length is limited to 65535.Summary: Perform "blind" updates and inserts on the primary while incrementing revisions
. Use a Lambda trigger watching for changes to revision
to insert revisions asynchronously.
Perform an update
with
{
UpdateExpression: 'ADD revisions :revisionIncrement',
ExpressionAttributeValues: {
':revisionIncrement': 1,
},
}
The ADD
action will automatically create revisions
if it doesn't exist and will consider it 0
.
For overwriting records with put
increment revisions
value based on a previous get
request.
Configure a DynamoDB Stream view type to return both new and old images. Setup a Lambda trigger against the database table. Here's a piece of sample code for NodeJS that would compare the old and new images and call a function to write the revisions in batch.
/**
* @param {AWSLambda.DynamoDBStreamEvent} event
* @return {void}
*/
export function handler(event) {
const oldRevisions = event.Records
.filter(record => record.dynamodb.OldImage
&& record.dynamodb.NewImage
&& record.dynamodb.OldImage.revision.N !== record.dynamodb.NewImage.revision.N)
.map(record => record.dynamodb.OldImage);
batchWriteRevisions(oldRevisions);
}
This is just sample, but production code would likely include more checks.
Cost
Notes:
For my use cases, I'm already using a DynamoDB Streams and I don't expect users to request versioned rows all that often. I also can let users wait a bit for the revisions to be ready since it's asynchronous. That makes using a second table and the automated lambda process the more ideal solution for me.
For the asynchronous options there are some points of failure. Still, it's something that you can either retry immediately on the on-demand requests, or schedule for later for the DynamoDB Stream solution.
If anybody has any other solutions or critiques, please comment. Thanks!