How to prevent Cassandra commit logs filling up disk space

plambre picture plambre · Jul 30, 2015 · Viewed 12.5k times · Source

I'm running a two node Datastax AMI cluster on AWS. Yesterday, Cassandra started refusing connections from everything. The system logs showed nothing. After a lot of tinkering, I discovered that the commit logs had filled up all the disk space on the allotted mount and this seemed to be causing the connection refusal (deleted some of the commit logs, restarted and was able to connect).

I'm on DataStax AMI 2.5.1 and Cassandra 2.1.7

If I decide to wipe and restart everything from scratch, how do I ensure that this does not happen again?

Answer

Aaron picture Aaron · Jul 30, 2015

You could try lowering the commitlog_total_space_in_mb setting in your cassandra.yaml. The default is 8192MB for 64-bit systems (it should be commented-out in your .yaml file... you'll have to un-comment it when setting it). It's usually a good idea to plan for that when sizing your disk(s).

You can verify this by running a du on your commitlog directory:

$ du -d 1 -h ./commitlog
8.1G    ./commitlog

Although, a smaller commit log space will cause more frequent flushes (increased disk I/O), so you'll want to keep any eye on that.

Edit 20190318

Just had a related thought (on my 4-year-old answer). I saw that it received some attention recently, and wanted to make sure that the right information is out there.

It's important to note that sometimes the commit log can grow in an "out of control" fashion. Essentially, this can happen because the write load on the node exceeds Cassandra's ability to keep up with flushing the memtables (and thus, removing old commitlog files). If you find a node with dozens of commitlog files, and the number seems to keep growing, this might be your issue.

Essentially, your memtable_cleanup_threshold may be too low. Although this property is deprecated, you can still control how it is calculated by lowering the number of memtable_flush_writers.

memtable_cleanup_threshold = 1 / (memtable_flush_writers + 1)

The documentation has been updated as of 3.x, but used to say this:

# memtable_flush_writers defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
# 
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8

...which (I feel) led to many folks setting this value WAY too high.

Assuming a value of 8, the memtable_cleanup_threshold is .111. When the footprint of all memtables exceeds this ratio of total memory available, flushing occurs. Too many flush (blocking) writers can prevent this from happening expediently. With a single /data dir, I recommend setting this value to 2.