How to configure RabbitMQ using Active/Passive High Availability architecture

rhodan picture rhodan · Dec 10, 2013 · Viewed 7.8k times · Source

I'm trying to setup a cluster of RabbitMQ servers, to get highly available queues using an active/passive server architecture. I'm following this guides:

  1. http://www.rabbitmq.com/clustering.html
  2. http://www.rabbitmq.com/ha.html
  3. http://karlgrz.com/rabbitmq-highly-available-queues-and-clustering-using-amazon-ec2/

My requirement for high availability is simple, i have two nodes (CentOS 6.4) with RabbitMQ (v3.2) and Erlang R15B03. The Node1 must be the "active", responding all requests, and the Node2 must be the "passive" node that has all the queues and messages replicated (from Node1).

To do that, i have configured the following:

  • Node1 with RabbitMQ working fine in non-cluster mode
  • Node2 with RabbitMQ working fine in non-cluster mode

The next I did was to create a cluster between both nodes: joining Node2 to Node1 (guide 1). After that I configured a policy to make mirroring of the queues (guide 2), replicating all the queues and messages among all the nodes in the cluster. This works, i can connect to any node and publish or consume message, while both nodes are available.

The problem occurs when i have a queue "queueA" that was created on the Node1 (master on queueA), and when Node1 is stopped, I can't connect to the queueA in the Node2 to produce or consume messages, Node2 throws an error saying that Node1 is not accessible (I think that queueA is not replicated to Node2, and Node2 can't be promoted as master of queueA).

The error is:

{"The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=404, text=\"NOT_FOUND - home node 'rabbit@node1' of durable queue 'queueA' in vhost 'app01' is down or inaccessible\", classId=50, methodId=10, cause="}

The sequence of steps used is:

Node1:

1. rabbitmq-server -detached
2. rabbitmqctl start_app

Node2:

3. Copy .erlang.cookie from Node1 to Node2
4. rabbitmq-server -detached

Join the cluster (Node2):

5. rabbitmqctl stop_app
6. rabbitmqctl join_cluster rabbit@node1
7. rabbitmqctl start_app

Configure Queue mirroring policy:

8. rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'

Note: The pattern used for queue names is "" (all queues).

When I run 'rabbitmqctl list_policies' and 'rabbitmqctl cluster_status' is everything ok.

Why the Node2 cannot respond if Node1 is unavailable? Is there something wrong in this setup?

Answer

Daniel Werner picture Daniel Werner · Jul 16, 2015

You haven't specified the virtual host (app01) in your set_policy call, thus the policy will only apply to the default virtual host (/). This command line should work:

rabbitmqctl set_policy -p app01 ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'