RabbitMQ inconsistent cluster

Question 1

RabbitMQ inconsistent cluster

rabbitmq mnesia

Sasha Ru · Jan 9, 2014 · Viewed 13.6k times · Source

Answer

Answer

I've found way to resolve question #2, to fix up cluster health with no downtime, we need to remove all mnesia data on inconsistent node:

[root@rmq01 ~]# rm -rf /var/lib/rabbitmq/mnesia/

[root@rmq01 ~]# service rabbitmq-server start
Starting rabbitmq-server: SUCCESS
rabbitmq-server.
[root@rmq01 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@rmq01' ...
[{nodes,[{disc,['rabbit@rmq02']},{ram,['rabbit@rmq01']}]},
 {running_nodes,['rabbit@rmq02','rabbit@rmq01']},
 {partitions,[]}]
...done.

I still do not understand how to avoid this scenario (question #1), maybe some mnesia customisations will help.

Question 2

Few questions about RabbitMQ v3.1.5 clustering. I have a cluster with 2 nodes, rabbitmq.config is like this on both nodes:

[
  {rabbit, [
    {cluster_nodes, {['rabbit@rmq01', 'rabbit@rmq02'], ram}},
    {tcp_listeners, [5674]}
  ]}
].

I already seen issue like this, and now I'm watching it again: When sometimes all cluster is shutting down, in case second node (rmq02) starts before first (rmq01), it 'forgets' about rmq01:

[root@rmq2 rabbitmq]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@rmq2' ...
[{nodes,[{disc,['rabbit@rmq2']}]},
 {running_nodes,['rabbit@rmq2']},
 {partitions,[]}]
...done.

After this first node (rmq01) can not start due to rmq2 disagrees about clustering:

{"init terminating in do_boot",{rabbit,failure_during_boot,{error,{inconsistent_cluster,"Node 'rabbit@rmq1' thinks it's clustered with node 'rabbit@rmq2', but 'rabbit@rmq2' disagrees"}}}}

I've tried to add rmq01 to rmq02, but seems I have to stop_app before this:

[root@rmq2 rabbitmq]# rabbitmqctl join_cluster rabbit@rmq1
Clustering node 'rabbit@rmq2' with 'rabbit@rmq1' ...
Error: mnesia_unexpectedly_running

Here I see that rmq02 forgot about rmq01:

[root@rmq2 ~]# cat /var/lib/rabbitmq/mnesia/rabbit\@rmq2/cluster_nodes.config 
{['rabbit@rmq2'],['rabbit@rmq2']}.

Meanwhile on rmq01 (correct configuration):

[root@rmq1 ~]# cat /var/lib/rabbitmq/mnesia/rabbit\@rmq1/cluster_nodes.config 
{['rabbit@rmq1','rabbit@rmq2'],['rabbit@rmq1']}.

Questions:

Is it normal rmq02 forgets about rmq01, or I have some missconfiguration? Why is this happening?
In case it is ok, is it possible to fix up cluster health without rmq02 downtime (I mean without stop_app)?

RabbitMQ inconsistent cluster

Answer

Related questions