Snapshot taking and restore strategies

Question 1

Snapshot taking and restore strategies

cqrs snapshot event-sourcing

Mikhas · Jun 24, 2016 · Viewed 7k times · Source

Answer

Answer

Rule #1: Don't.
Rule #2: Don't.

Snapshotting an event sourced model is a performance optimization. The first rule of performance optimization? Don't.

Specifically, snapshotting reduces the amount of time you lose in your repository trying to reload the history of your model from your event store.

If your repository can keep the model in memory, then you aren't going to be reloading it very often. So the win from snapshotting will be small. Therefore: don't.

If you can decompose your model into aggregates, which is to say that you can decompose the history of your model into a number of entities that have non-overlapping histories, then your one model long model history becomes many many short histories that each describe the changes to a single entity. Each entity history that you need to load will be pretty short, so the win from a snapshot will be small. Therefore: don't.

The kind of systems I'm working today require high performance but not 24x7 availability. So in a situation where I shut down my system for maintenace and restart it I'd have to load and reprocess all my event store as my fresh system doesn't know which aggregate ids to process the events. I need a better starting point for my systems to restart be more efficient.

You are worried about missing a write SLA when the repository memory caches are cold, and you have long model histories with lots of events to reload. Bolting on snapshotting might be a lot more reasonable than trying to refactor your model history into smaller streams. OK....

The snapshot store is a read model -- at any point in time, you should be able to blow away the model and rebuild it from the persisted history in the event store.

From the perspective of the repository, the snapshot store is a cache; if no snapshot is available, or if the store itself doesn't respond within the SLA, you want to fall back to reprocessing the entire event history, starting from the initial seed state.

The service provider interface is going to look something like

interface SnapshotClient {
    SnapshotRecord getSnapshot(Identifier id)
}

SnapshotRecord is going to provide to the repository the information it needs to consume the snapshot. That's going to include at a minimum

a memento that allows the repository to rehydrate the snapshotted state
a description of the last event processed by the snapshot projector when building the snapshot.

The model will then re-hydrate the snapshotted state from the memento, load the history from the event store, scanning backwards (ie, starting from the most recent event) looking for the event documented in the SnapshotRecord, then apply the subsequent events in order.

The SnapshotRepository itself could be a key-value store (at most one record for any given id), but a relational database with blob support will work fine too

select * 
from snapshots s 
where id = ? 
order by s.total_events desc 
limit 1

The snapshot projector and the repository are tightly coupled -- they need to agree on what the state of the entity should be for all possible histories, they need to agree how to de/re-hydrate the memento, and they need to agree which id will be used to locate the snapshot.

The tight coupling also means that you don't need to worry particularly about the schema for the memento; a byte array will be fine.

They don't, however, need to agree with previous incarnations of themselves. Snapshot Projector 2.0 discards/ignores any snapshots left behind by Snapshot Projector 1.0 -- the snapshot store is just a cache after all.

i'm designing an application that will probably generate millions event a day. what can we do if we need to rebuild a view 6 month later

One of the more compelling answers here is to model time explicitly. Do you have one entity that lives for six months, or do you have 180+ entities that each live for one day? Accounting is a good domain to reference here: at the end of the fiscal year, the books are closed, and the next year's books are opened with the carryover.

Yves Reynhout frequently talks about modeling time and scheduling; Evolving a Model may be a good starting point.

Question 2

I've been reading about CQRS+EventSoucing patterns (which I wish to apply in a near future) and one point common to all decks and presentations I found is to take snapshots of your model state in order to restore it, but none of these share patterns/strategies of doing that.

I wonder if you could share your thoughts and experience in this matter particularly in terms of:

When to snapshot
How to model a snapshot store
Application/cache cold start

TL;DR: How have you implemented Snapshotting in your CQRS+EventSourcing application? Pros and Cons?

Snapshot taking and restore strategies

Answer

Related questions