What is the fastest way to load data into Cassandra column-family

Pedro Cunha picture Pedro Cunha · Oct 28, 2015 · Viewed 9.2k times · Source

I created a Cassandra column-family and I need to load data from a CSV file for this column family. The csv file has a 15 Gb volume.

I am using the CQL 'COPY FROM' command but this takes a long time to make loading the data. What is the best/simplest way to load large amounts of data to Cassandra from csv files?

Answer

BrianC picture BrianC · Oct 28, 2015

The CQLSH built-in copy to/from CSV files is pretty simple and is intended for small to moderate sized data sets. You didn't mention which Cassandra version you're using, but there were a lot of performance improvements made in 2.1.5 (CASSANDRA-8225).

An alternative tool that has had good results for larger data is cassandra-loader. You could try that with a subset of your file (like 1000 rows) to confirm it works, then try with your whole file to see the performance.