I'm trying to figure out how to migrate data from one cassandra cluster, to another cassandra cluster of a different ring size...say from a 5 node cluster to a 7 node cluster.
I started looking at sstable2json, since it creates a json file for the SSTable on that specific cassandra node. My thought was to do this for a column family on each node in the ring. So on a 5 node ring, this would give me 5 json files, one file for the data stored on in the column family that resides on each node.
Then I'd merge the json files into one file, and use json2sstable to import into a new cluster, of size, lets say 7. I was hoping that cassandra would then replicate/balance the data out evenly across the nodes in the ring, but I just read that SSTables are immutable once written. So if I did what I just mentioned, I'd end up with a ring with all the data in my column family on one node.
So can anyone help me figure out the process for migrating data from one cluster to a different cluster of a different ring size?
Better: use bin/sstableloader on the sstables from the old ring, to stream to the new one.
Normally sstableloader is used in a sequence like this:
Since you're looking to stream data from an existing cluster A to a new cluter B, you can skip straight to running sstableloader against the data on each node in cluster A.
More details on using sstableloader in this blog post.