Using an RDBMS as event sourcing storage

Neil Barnwell picture Neil Barnwell · Aug 15, 2011 · Viewed 37k times · Source

If I were using an RDBMS (e.g. SQL Server) to store event sourcing data, what might the schema look like?

I've seen a few variations talked about in an abstract sense, but nothing concrete.

For example, say one has a "Product" entity, and changes to that product could come in the form of: Price, Cost and Description. I'm confused about whether I'd:

  1. Have a "ProductEvent" table, that has all the fields for a product, where each change means a new record in that table, plus "who, what, where, why, when and how" (WWWWWH) as appropriate. When cost, price or description are changed, a whole new row as added to represent the Product.
  2. Store product Cost, Price and Description in separate tables joined to the Product table with a foreign key relationship. When changes to those properties occur, write new rows with WWWWWH as appropriate.
  3. Store WWWWWH, plus a serialised object representing the event, in a "ProductEvent" table, meaning the event itself must be loaded, de-serialised and re-played in my application code in order to re-build the application state for a given Product.

Particularly I worry about option 2 above. Taken to the extreme, the product table would be almost one-table-per-property, where to load the Application State for a given product would require loading all events for that product from each product event table. This table-explosion smells wrong to me.

I'm sure "it depends", and while there's no single "correct answer", I'm trying to get a feel for what is acceptable, and what is totally not acceptable. I'm also aware that NoSQL can help here, where events could be stored against an aggregate root, meaning only a single request to the database to get the events to rebuild the object from, but we're not using a NoSQL db at the moment so I'm feeling around for alternatives.

Answer

Dennis Traub picture Dennis Traub · Aug 15, 2011

The event store should not need to know about the specific fields or properties of events. Otherwise every modification of your model would result in having to migrate your database (just as in good old-fashioned state-based persistence). Therefore I wouldn't recommend option 1 and 2 at all.

Below is the schema as used in Ncqrs. As you can see, the table "Events" stores the related data as a CLOB (i.e. JSON or XML). This corresponds to your option 3 (Only that there is no "ProductEvents" table because you only need one generic "Events" table. In Ncqrs the mapping to your Aggregate Roots happens through the "EventSources" table, where each EventSource corresponds to an actual Aggregate Root.)

Table Events:
    Id [uniqueidentifier] NOT NULL,
    TimeStamp [datetime] NOT NULL,

    Name [varchar](max) NOT NULL,
    Version [varchar](max) NOT NULL,

    EventSourceId [uniqueidentifier] NOT NULL,
    Sequence [bigint], 

    Data [nvarchar](max) NOT NULL

Table EventSources:
    Id [uniqueidentifier] NOT NULL, 
    Type [nvarchar](255) NOT NULL, 
    Version [int] NOT NULL

The SQL persistence mechanism of Jonathan Oliver's Event Store implementation consists basically of one table called "Commits" with a BLOB field "Payload". This is pretty much the same as in Ncqrs, only that it serializes the event's properties in binary format (which, for instance, adds encryption support).

Greg Young recommends a similar approach, as extensively documented on Greg's website.

The schema of his prototypical "Events" table reads:

Table Events
    AggregateId [Guid],
    Data [Blob],
    SequenceNumber [Long],
    Version [Int]