Cassandra Frequent Read Write Timeouts

user2572801 picture user2572801 · Aug 7, 2013 · Viewed 26.4k times · Source

I had changed whole codebase from Thrift to CQL using datastax java driver 1.0.1 and cassandra 1.2.6..

with thrift I was getting frequent timeouts from start, I was not able to proceed...Adopting CQL, tables designed as per that I got success and less timeouts....

With that I was able to insert huge data which were not working with thrift...But after a stage, data folder around 3.5GB. I am getting frequent write timeout exceptions. even I do same earlier working use case again that also throws timeout exception now. ITS RANDOM ONCE WORKED IS NOT WORKING AGAIN EVEN AFTER FRESH SETUP.

CASSADNRA SERVER LOG

this is cassandra server partial log DEBUG mode at then time I got the error :

http://pastebin.com/rW0B4MD0

Client exception is :

Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214)
    at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169)
    at com.datastax.driver.core.Session.execute(Session.java:107)
    at com.datastax.driver.core.Session.execute(Session.java:76)

Infrastructure : 16GB machine with 8GB heap given to cassandra, i7 processor.. I am using SINGLE node cassandra with this yaml tweaked for timeout, everything else is default :

  • read_request_timeout_in_ms: 30000
  • range_request_timeout_in_ms: 30000
  • write_request_timeout_in_ms: 30000
  • truncate_request_timeout_in_ms: 60000
  • request_timeout_in_ms: 30000

USE CASE : i am running a usecase which stores Combinations(my project terminology) in cassandra....Currently testing storing 250 000 combinations with 100 parallel threads..each thread storing one combination...real case i need to support of tens of millions but that would need different hardware and multi node cluster...

In Storing ONE combination takes around 2sec and involves:

  • 527 INSERT INTO queries
  • 506 UPDATE queries
  • 954 SELECT queries

100 parallel threads parallel storing 100 combinations.

I had found behaviour of WRITE TIMEOUTS random some time it works till 200 000 then throw timeouts AND sometimes do not work even for 10k combinations. RANDOM BEHAVIOUR.

Answer

Mr'Black picture Mr'Black · May 29, 2016

I found that during some cassandra-stress read operations, if i set the rate threads too high i will get that CL error. Consider to lower during your test the number of threads to something affordable for your pool to sustain in order to beat the

  • read_request_timeout_in_ms

In my opinion modifying that in cassandra.yaml is not always a good idea. Consider the hardware resources your machines work with.

for egg :

cassandra-stress read n=100000 cl=ONE -rate threads=200 -node N1

will give me the error, while

cassandra-stress read n=100000 cl=ONE -rate threads=121 -node N1

will do smoothly the job.

Hope it can help you up guys.

P.S. when you do read tests try to spread the reads even on the data with the '-pop dist=UNIFORM(1..1000000)' or how much you want.