I’m trying to figure out which cache concurrency strategy should I use for my application (for entity updates, in particular). The application is a web-service developed using Hibernate, is deployed on Amazon EC2 cluster and works on Tomcat, so no application server there.
I know that there are nonstrict-read-write \ read-write and transactional cache concurrency strategies for data that can be updated and there are mature, popular, production ready 2L cache providers for Hibernate: Infinispan, Ehcache, Hazelcast.
But I don't completely understand the difference between the transactional and read-write caches from the Hibernate documentation. I thought that the transactional cache is the only choice for a cluster application, but now (after reading some topics), I'm not so sure about that.
So my question is about the read-write cache. Is it cluster-safe? Does it guarantee data synchronization between database and the cache as well as synchronization between all the connected servers? Or it is only suitable for single server applications and I should always prefer the transactional cache?
For example, if a database transaction that is updating an entity field (first name, etc.) fails and has been rolled back, will the read-write cache discard the changes or it will just populate the bad data (the updated first name) to all the other nodes? Does it require a JTA transaction for this?
The Concurrency strategy configuration for JBoss TreeCache as 2nd level Hibernate cache topic says:
'READ_WRITE` is an interesting combination. In this mode Hibernate itself works as a lightweight XA-coordinator, so it doesn't require a full-blown external XA. Short description of how it works:
- In this mode Hibernate manages the transactions itself. All DB actions must be inside a transaction, autocommit mode won't work.
- During the flush() (which might appear multiple time during transaction lifetime, but usually happens just before the commit) Hibernate goes through a session and searches for updated/inserted/deleted objects. These objects then are first saved to the database, and then locked and updated in the cache so concurrent transactions can neither update nor read them.
- If the transaction is then rolled back (explicitly or because of some error) the locked objects are simply released and evicted from the cache, so other transactions can read/update them.
- If the transaction is committed successfully, then the locked objects are simply released and other threads can read/write them.
Is there some documentation how this works in a cluster environment?
It seems that the transactional cache works correctly for this, but requires JTA environment with a standalone transaction manager (such as JBossTM, Atomikos, Bitronix), XA datasource and a lot of configuration changes and testing. I managed to deploy this, but still have some issues with my frameworks. For instance, Google Guice IoC does not support JTA transactions and I have to replace it with Spring or move the service to some application server and use EJB.
So which way is better?
Thanks in advance!
Summary of differences
The best way to understand the differences between these strategies is to see how they behave during the course of the Insert, update or delete operations.
You can check out my post here which describes the differences in further detail. Feel free to comment.