Distributed RabbitMQ Nodes don't recognize each other

Ittai picture Ittai · Jun 14, 2012 · Viewed 13k times · Source

I'm working on a RabbitMQ distributed POC and I'm stuck at the basics of clustering the nodes.
I'm trying to follow the rabbit's tutorial on clustering so this is my reference.
After installing erlang (R14B04) and rabbit (2.8.2-1) I've copied the .erlang.cookie file contents from one node to the other two.
I wasn't sure about how to get erlang to notice this change to I had to restart the machines themselves (pretty brute force but I don't know erlang at all).
In addtion I opened in iptables 4369 and 5 additional ports for communications and placed under /usr/lib64/erlang/bin/sys.config the following config:

{kernel,[{inet_dist_listen_min, XX00},{inet_dist_listen_max,XX05}]}]

Then another restart (dumb I know) to verify erlang takes these into consideration but still when I run:

rabbitmqctl cluster rabbit@HostName1

I get:

Clustering node rabbit@HostName2 with [rabbit@HostName1] ...
Error: {no_running_cluster_nodes,[rabbit@HostName1],
                                 [rabbit@HostName1]}

There is a chance my fiddling with the erlang.cookie or with the ports did not succeed but I don't know how to check them. I tried typing erl in the cmd and then erl_epmd:names() or other commands to get more information but I'm probably way off in erlang land.

Would truly appreciate any help

Update:
I tried pinging two erlang nodes manually and got pang back.
I did the following:
Connected to two nodes, stopped rabbitmq (wasn't sure if needed but to be sure), started erlang like so (erl -sname dilbert and erl -sname dilbert2) when the erlang command line started i ran node(). on each of them and got dilbert@HostName1 and dilbert2@HostName2 respectively. I then tried to run net_adm:ping('dilbert'). and net_adm:ping('dilbert@HostName1'). with the single quote and without them from both nodes (changed names of course) and got on all 8 cases pang.
When I ran nodes(). on one of the machines I got back an empty array.
I've also tried to allow all traffic in the firewall (script) and then try to run the above commands (don't worry they're back on now) and still got back pang.
Update2:
For some reason I had cookies mismatch which I needed to resolve (thanks @kjw0188 for the suggestion [I ran erlang:get_cookie(). in the erlang command line]).
This did not help and I needed to stop iptables completely (not sure why but I'll figure it soon) and load the erlang node with -name dilbert@my-ip because my rackspace servers have no dns-name. This finally enabled me to get a pong and see the nodes see each other (nodes(). returns a non-empty array after the ping).
The problem I'm facing now is how to instruct RabbitMQ to use -name instead of -sname when starting erlang.

Answer

Ittai picture Ittai · Jun 18, 2012

So I had multiple issues with connecting my two RabbitMQ nodes-
I'll add that my nodes are hosted on rackspace, and so don't have a default exposable hostname, and require iptables since there is no DMZ or built in security group concept like amazon.

Problems:
1. Cookie- Not sure how or why but I had multiple instances of .erlang.cookie (in /root, in my home directory and in /var/lib/rabbitmq/) I kept only the one in rabbitmq and verified all nodes have the same cookie.
2. IPTables- In order for the nodes to communicate I needed to open the epmd port and the range of ports for the actual communication inet_dist_listen_min inet_dist_listen_max.

/sbin/iptables -A INPUT -i eth1 -p tcp --dport ${epmd} -s ${otherNode} -j ACCEPT
/sbin/iptables -A INPUT -i eth1 -p tcp --dport ${inet_dist_listen_min}:${inet_dist_listen_max} -s ${otherNode} -j ACCEPT  

empd is the usuall 4369 port and for the other range use whatever range you want.
${otherNode} is the ip of my other node.
I also needed to configure erlang through rabbitmq to use these ports (see config file at end)
3. HostName- Seeing as I don't have a hostname I needed to edit the rabbit scripts to use -name and not -sname (the first tells erlang to take the whole name, the latter stands for short name and thus appends an @ symbol and the hostname).
This was accomplished by editing:
/usr/lib/rabbitmq/bin/rabbitmqctl
Added at the beginning the definition of the RABBITMQ_NODE_IP_ADDRESS property

DEFAULT_NODE_IP_ADDRESS=auto
DEFAULT_NODE_PORT=5672
[ "x" = "x$RABBITMQ_NODE_IP_ADDRESS" ] && RABBITMQ_NODE_IP_ADDRESS=${NODE_IP_ADDRESS}
[ "x" = "x$RABBITMQ_NODE_PORT" ] && RABBITMQ_NODE_PORT=${NODE_PORT}

[ "x" = "x$RABBITMQ_NODE_IP_ADDRESS" ] && [ "x" != "x$RABBITMQ_NODE_PORT" ] && RABBITMQ_NODE_IP_ADDRESS=${DEFAULT_NODE_IP_ADDRESS}
[ "x" != "x$RABBITMQ_NODE_IP_ADDRESS" ] && [ "x" = "x$RABBITMQ_NODE_PORT" ] && RABBITMQ_NODE_PORT=${DEFAULT_NODE_PORT}

and in the actual erl command I changed
-sname ${RABBITMQ_NODENAME} \ to
-name ${RABBITMQ_NODENAME}@${RABBITMQ_NODE_IP_ADDRESS}\.
This made rabbitmq listen only on the specified ip address (specified in the config file at the end) and load with that ip instead of the usuall hostname.

edited /usr/lib/rabbitmq/bin/rabbitmq-server
Changed the actual erl command from -sname ${RABBITMQ_NODENAME} \ to -name ${RABBITMQ_NODENAME}@${RABBITMQ_NODE_IP_ADDRESS}\

Added a rabbit conf (/etc/rabbitmq/rabbitmq-env.conf) file with-

#the ip address which rabbit should use, this is to limit rabbit to only use internal rackspace communication and not publicly accessible ports  
NODE_IP_ADDRESS=myIpAdress  
#had to change the nodename becaue otherwise rabbitmq used rabbit@Hostname and not only rabbit  
NODENAME=myCompany
#This instructed rabbit to instruct erlang which ports it should use for its communications with other nodes  
export SERVER_ERL_ARGS="$SERVER_ERL_ARGS -kernel inet_dist_listen_min somePort -kernel inet_dist_listen_max someOtherBiggerPort"

Some resources which helped me along the way:
RabbitMQ Clustering Guide
Clustering RabbitMQ servers for High Availability
rabbitmq-env.conf(5) manual page
Node communication by public IP address erlang mailing list (The middle post)
Configuring RabbitMQ Cluster on Cloud

Hope this will help anyone else.

EDIT:
Not sure how I was mistaken but it seemed my erlang-rabbit port instructions were not taken into consideration or were not enough. Ended up having to allow all communications between the two nodes...