How does Google Docs deal with editing collisions?

MatthewSot picture MatthewSot · Jun 27, 2015 · Viewed 7.6k times · Source

I've been toying around with writing my own Javascript editor, with functionality similar to Google Docs (allowing multiple people to work on it at the same time). One thing I don't understand:

Let's say you've got User A and User B connected directly to each other with a network delay of 10ms. I'm assuming the editor uses a diff system (as I understand Docs does) where edits are represented like "insert 'text' at index 3," and that diffs are timestamped and forced to apply chronologically by all clients.

Let's start off with a document containing the text: "xyz123"

User A types "abc" at the begining of the document at timestamp 001ms, while User B types "hello" between "xyz" and "123" at timestamp 005ms.

Both users would expect the result to be: "abcxyzhello123," however, taking into account network delay:

  • User B would receive User A's edits of "insert 'abc' at index 0" at time 011ms. In order to keep the chronological order, User B would undo User B's insertion at index 3, insert User A's "abc" at index 0, then re-insert User B's insertion at index 3, which is now between "abc" and "xyz," thus giving "abchelloxyz123"
  • User A would receive User B's edits of "insert 'hello' at index 3" at time 015ms. It would recognize that User B's insertion was done after User A's, and simply insert "hello" at index 3 (now between "abc" and "xyz"), giving "abchelloxyz123"

Of course, "abchelloxyz123" is not the same as "abcxyzhello123"

Other than literally assigning each and every character its own unique ID, I can't imagine how Google would manage to solve this problem effectively.

Some possibilities I've thought of:

  • Tracking insertion points instead of sending indexes with diffs would work, except you would have the exact same problem if User B moved his insertion point 1ms before editing.
  • You could have User B send some information with his diff, like "inserting after 'xyz'" so that User A could intelligently recognize this has happened, but what if User A inserts the text "xyz?"
  • User B could recognize that this has happened (when it receives User A's diff and sees that it's a conflict), then send out a diff undoing User B's edits and a new diff that inserts User B's "hello" "abc".length index further right. The problem with this is (1) User A would see a "jump" in the text and (2) if User A keeps editing then User B would have to continuously fix its diffs - even the "fixer" diffs would be off and need to fix, exponentially increasing the complexity.
  • User B could send along with its diff a property that the last timestamped diff it received was -005ms or something, then A could recognize that B didn't know about its changes (since A's diff was at 001ms) and do conflict resolution then. The issue is (1) all users timestamps will be slightly off, considering most computer clocks aren't accurate to the ms and (2) if there's a third User C with a 25ms lag with User A but a 2ms lag with User B, and User C adds some text between "x" and "y" at -003ms, then User B would use User C's edit as a reference point, but User A wouldn't know about User C's edit (and thus User B's reference point) until 22ms. I believe this could be solved if you used a common server to timestamp all edits, but that seems rather involved.
  • You could give each character a unique ID, then work off of those IDs instead of indexes, but that seems like overkill...

I'm reading through http://www.waveprotocol.org/whitepapers/operational-transform, but would love to hear any and all approaches to fixing this problem.

Answer

Marcel Klehr picture Marcel Klehr · Apr 1, 2016

There are different possibilities for realizing concurrent changing of replicas, depending on the scenario's topology and with different trade-offs.

Using a central server

The most common scenario is a central server that all clients have to communicate with.

The server could keep track of how the document of each participant looks. Both A and B then send a diff with their changes to the server. The server would then apply the changes to the respective tracking documents. Then it would perform a three-way-merge and apply the changes to the master document. It would then send the diff between the master document and the tracking documents to the respective clients. This is called differential synchronization.

A different approach is called operation(al) transformation, which is similar to rebasing in traditional version control systems. It doesn't require a central server, but having one makes things much easier if you have more than 2 participants (see the OT FAQ). The gist is that you transform the changes in one edit so that the edit assumes that the changes of another edit already happened. E.g. A would transform B's edit insert(3, hello) against its edit insert(0, abc) with the result insert(6, hello).

The difference between rebasing and OT is that rebasing doesn't guarantee consistency if you apply edits in different orders (e.g. if B were to rebase A's edit against theirs the other way around, this can lead to diverging document states). The promise of OT on the other hand is to allow any order if you do the right transformations.

No central server

OT algorithms exist that can deal with peer-to-peer scenarios (with the trade-off of increased implementation complexity on the control layer and increased memory usage). Instead of a simple timestamp, a Version vector can be used to keep track of the state an edit is based on. Then (depending on the capability of your OT algorithm, specifically transform property 2), incoming edits can be transformed to fit in the order they are received, or the version vector can be used to impose a partial order on the edits -- in this case history needs to be "rewritten", by undoing and transforming edits, so that they adhere to the order imposed by the version vectors.

Lastly, there are a group of algorithms based on CRDT, called WOOT, Treedoc or Logoot, which try to solve the problem with specially designed data types that allow operations to commute, so the order in which they are applied doesn't matter (this is similar to your idea of an ID for each character). The trade-offs here are memory consumption and overhead in operation construction.