How to remove dead node out of the Cassandra cluster?

samarth picture samarth · Dec 21, 2011 · Viewed 15.1k times · Source
  1. I have the cassandra cluster of 12 nodes on EC2.
  2. Because of some failure we lost one of the node completely.I mean that machine do not exist anymore.
  3. So i have created the new EC2 instance with different ip and same token as that of the dead node and i also had the backup of data on that node so it works fine
  4. But the problem is the dead nodes ip still appears as a unreachable node in describe cluster.
  5. As that node (EC2 instance) does not exist anymore I can not use the nodetool decommission or nodetool disablegossip

How can i get rid of this unreachable node

Answer

Alexis Wilke picture Alexis Wilke · Oct 31, 2014

I had the same problem and I resolved it with removenode, which does not require you to find and change the node token.

First, get the node UUID:

nodetool status

DN  192.168.56.201  ?          256     13.1%  4fa4d101-d8d2-4de6-9ad7-a487e165c4ac  r1
DN  192.168.56.202  ?          256     12.6%  e11d219a-0b65-461e-babc-6485343568f8  r1
UN  192.168.2.91    156.04 KB  256     12.4%  e1a33ed4-d613-47a6-8b3b-325650a2bbd4  RAC1
UN  192.168.2.92    156.22 KB  256     13.6%  3a4a086c-36a6-4d69-8b61-864ff37d03c9  RAC1
UN  192.168.2.93    149.6 KB   256     11.3%  20decc72-8d0a-4c3b-8804-cc8bc98fa9e8  RAC1

As you can see the .201 and .202 are dead and on a different network. These have been changed to .91 and .92 without proper decommissioning and recommissioning. I was working on installing the network and made a few mistakes...

Second, remove the .201 with the following command:

nodetool removenode 4fa4d101-d8d2-4de6-9ad7-a487e165c4ac

(in older versions it was nodetool remove ...)

But just like for the nodetool removetoken ..., it blocks... (see comment by samarth in psandord answer) However, it has a side effect, it puts that UUID in a list of nodes to be removed. So next we can force the removal with:

nodetool removenode force

(in older versions it was nodetool remove ...)

Now the node accepts the command it tells me that it is removing the invalid entry:

RemovalStatus: Removing token (-9136982325337481102). Waiting for replication confirmation from [/192.168.2.91,/192.168.2.92].

We also see that it communicates with the two other nodes that are up and thus it takes a little time, but it is still quite fast.

Next a nodetool status does not show the .201 node. I repeat with .202 and now the status is clean.

After that you may also want to run a cleanup as mentioned in psanford answer:

nodetool cleanup

The cleanup should be run on all nodes, one by one, to make sure the change is fully taken in account.