Data Consistency Across Microservices

Santanu Dey picture Santanu Dey · May 13, 2017 · Viewed 10.8k times · Source

While each microservice generally will have its own data - certain entities are required to be consistent across multiple services.

For such data consistency requirement in a highly distributed landscape such as microservices architecture, what are the choices for design? Of course, I do not want shared database architecture, where a single DB manages the state across all the services. That violates isolation and shared-nothing principles.

I do understand that, a microservice can publish an event when an entity is created, updated or deleted. All other microservices which are interested in this event can accordingly update the linked entities in their respective databases.

This is workable, however it leads to a lot of careful and coordinated programming effort across the services.

Can Akka or any other framework solve this use case? How?

EDIT1:
Adding the below diagram for clarity.
Basically, I am trying to understand, if there are available frameworks today that can solve this data consistency problem.

For the queue I can use any AMQP software such as RabbitMQ or Qpid etc. For the data consistency framework, I am not sure if presently Akka or any other software can help. Or is this scenario so uncommon, and such an anti-pattern that no framework should be ever needed?
enter image description here

Answer

Oswin Noetzelmann picture Oswin Noetzelmann · May 14, 2017

The Microservices architectural style tries to allow organizations to have small teams own services independent in development and at runtime. See this read. And the hardest part is to define the service boundaries in a useful way. When you discover that the way you split up your application results in requirements impacting multiple services frequently that would tell you to rethink the service boundaries. The same is true for when you feel a strong need to share entities between the services.

So the general advice would be to try very hard to avoid such scenarios. However there may be cases where you cannot avoid this. Since a good architecture is often about making the right trade-offs, here some ideas.

  1. Consider expressing the dependency using service interfaces (API) instead of a direct DB dependency. That would allow each service team to change their internal data schema as much as required and only worry about the interface design when it comes to dependencies. This is helpful because it is easier to add additional APIs and slowly deprecate older APIs instead of changing a DB design along with all dependent Microservices (potentially at the same time). In other words you are still able to deploy new Microservice versions independently, as long as the old APIs are still supported. This is the approach recommended by the Amazon CTO, who was pioneering a lot of the Microservices approach. Here is a recommended read of an interview in 2006 with him.

  2. Whenever you really really cannot avoid using the same DBs and you are splitting your service boundaries in a way that multiple teams/services require the same entities, you introduce two dependencies between the Microservice team and the team that is responsible for the data scheme: a) Data Format, b) Actual Data. This is not impossible to solve, but only with some overhead in organization. And if you introduce too many of such dependencies your organization will likely be crippled and slowed down in development.

a) Dependency on the data scheme. The entities data format cannot be modified without requiring changes in the Microservices. To decouple this you will have to version the entities data scheme strictly and in the database support all versions of the data that the Microservices are currently using. This would allow the Microservices teams to decide for themselves when to update their service to support the new version of the data scheme. This is not feasible with all use cases, but it works with many.

b) Dependency on the actual collected data. The data that has been collected and is of a known version for a Microservice is OK to use, but the issue occurs when you have some services producing a newer version of the data and another service depends on it - But was not yet upgraded to being able to read the latest version. This problem is hard to solve and in many cases suggests you did not chose the service boundaries correctly. Typically you have no choice but to roll out all services that depend on the data at the same time as upgrading the data in the database. A more wacky approach is to write different versions of the data concurrently (which works mostly when the data is not mutable).

To solve both a) and b) in some other cases the dependency can be reduced by hidden data duplication and eventual consistency. Meaning each service stores its own version of the data and only modifies it whenever the requirements for that service change. The services can do so by listening to a public data flow. In such scenarios you would be using an event based architecture where you define a set of public events that can be queued up and consumed by listeners from the different services that will process the event and store whatever data out of it that is relevant for it (potentially creating data duplication). Now some other events may indicate that internally stored data has to be updated and it is each services responsibility to do so with its own copy of the data. A technology to maintain such a public event queue is Kafka.