What is a "distributed transaction"?

Zombie picture Zombie · Nov 18, 2010 · Viewed 23k times · Source

The Wikipedia article for Distributed transaction isn't very helpful.

Can you give a high-level description with more details of what a distributed transaction is?

Also, can you give me an example of why an application or database should perform a transaction that updates data on two or more networked computers?

I understand the classic bank example; I care more about distributed transactions in Web-scale databases like Dynamo, Bigtable, HBase, or Cassandra.


Heinzi picture Heinzi · Nov 18, 2010

Usually, transactions occur on one database server:

SELECT something FROM myTable
UPDATE something IN myTable

A distributed transaction involves multiple servers:

UPDATE amount = amount - 100 IN bankAccounts WHERE accountNr = 1
UPDATE amount = amount + 100 IN someRemoteDatabaseAtSomeOtherBank.bankAccounts WHERE accountNr = 2

The difficulty comes from the fact that the servers must communicate to ensure that transactional properties such as atomicity are satisfied on both servers: If the transaction succeeds, the values must be updated on both servers. If the transaction fails, the transaction must be rollbacked on both servers. It must never happen that the values are updated on one server but not updated on the other.