How does LMAX's disruptor pattern work?

Shahbaz picture Shahbaz · Jul 2, 2011 · Viewed 73.5k times · Source

I am trying to understand the disruptor pattern. I have watched the InfoQ video and tried to read their paper. I understand there is a ring buffer involved, that it is initialized as an extremely large array to take advantage of cache locality, eliminate allocation of new memory.

It sounds like there are one or more atomic integers which keep track of positions. Each 'event' seems to get a unique id and it's position in the ring is found by finding its modulus with respect to the size of the ring, etc., etc.

Unfortunately, I don't have an intuitive sense of how it works. I have done many trading applications and studied the actor model, looked at SEDA, etc.

In their presentation they mentioned that this pattern is basically how routers work; however I haven't found any good descriptions of how routers work either.

Are there some good pointers to a better explanation?

Answer

Michael Barker picture Michael Barker · Jul 3, 2011

The Google Code project does reference a technical paper on the implementation of the ring buffer, however it is a bit dry, academic and tough going for someone wanting to learn how it works. However there are some blog posts that have started to explain the internals in a more readable way. There is an explanation of ring buffer that is the core of the disruptor pattern, a description of the consumer barriers (the part related to reading from the disruptor) and some information on handling multiple producers available.

The simplest description of the Disruptor is: It is a way of sending messages between threads in the most efficient manner possible. It can be used as an alternative to a queue, but it also shares a number of features with SEDA and Actors.

Compared to Queues:

The Disruptor provides the ability to pass a message onto another threads, waking it up if required (similar to a BlockingQueue). However, there are 3 distinct differences.

  1. The user of the Disruptor defines how messages are stored by extending Entry class and providing a factory to do the preallocation. This allows for either memory reuse (copying) or the Entry could contain a reference to another object.
  2. Putting messages into the Disruptor is a 2-phase process, first a slot is claimed in the ring buffer, which provides the user with the Entry that can be filled with the appropriate data. Then the entry must be committed, this 2-phase approach is necessary to allow for the flexible use of memory mentioned above. It is the commit that makes the message visible to the consumer threads.
  3. It is the responsibility of the consumer to keep track of the messages that have been consumed from the ring buffer. Moving this responsibility away from the ring buffer itself helped reduce the amount of write contention as each thread maintains its own counter.

Compared to Actors

The Actor model is closer the Disruptor than most other programming models, especially if you use the BatchConsumer/BatchHandler classes that are provided. These classes hide all of the complexities of maintaining the consumed sequence numbers and provide a set of simple callbacks when important events occur. However, there are a couple of subtle differences.

  1. The Disruptor uses a 1 thread - 1 consumer model, where Actors use an N:M model i.e. you can have as many actors as you like and they will be distributed across a fixed numbers of threads (generally 1 per core).
  2. The BatchHandler interface provides an additional (and very important) callback onEndOfBatch(). This allows for slow consumers, e.g. those doing I/O to batch events together to improve throughput. It is possible to do batching in other Actor frameworks, however as nearly all other frameworks don't provide a callback at the end of the batch you need to use a timeout to determine the end of the batch, resulting in poor latency.

Compared to SEDA

LMAX built the Disruptor pattern to replace a SEDA based approach.

  1. The main improvement that it provided over SEDA was the ability to do work in parallel. To do this the Disruptor supports multi-casting the same messages (in the same order) to multiple consumers. This avoids the need for fork stages in the pipeline.
  2. We also allow consumers to wait on the results of other consumers without having to put another queuing stage between them. A consumer can simply watch the sequence number of a consumer that it is dependent on. This avoids the need for join stages in pipeline.

Compared to Memory Barriers

Another way to think about it is as a structured, ordered memory barrier. Where the producer barrier forms the write barrier and the consumer barrier is the read barrier.