I had changed whole codebase from Thrift
to CQL
using datastax java driver 1.0.1
and cassandra 1.2.6..
with thrift I was getting frequent timeouts from start, I was not able to proceed...Adopting CQL, tables designed as per that I got success and less timeouts....
With that I was able to insert huge data which were not working with thrift...But after a stage, data folder around 3.5GB. I am getting frequent write timeout exceptions. even I do same earlier working use case again that also throws timeout exception now. ITS RANDOM ONCE WORKED IS NOT WORKING AGAIN EVEN AFTER FRESH SETUP.
CASSADNRA SERVER LOG
this is cassandra server partial log DEBUG mode at then time I got the error :
Client exception is :
Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214)
at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169)
at com.datastax.driver.core.Session.execute(Session.java:107)
at com.datastax.driver.core.Session.execute(Session.java:76)
Infrastructure : 16GB machine with 8GB heap given to cassandra, i7 processor.. I am using SINGLE node cassandra with this yaml tweaked for timeout, everything else is default :
USE CASE : i am running a usecase which stores Combinations(my project terminology) in cassandra....Currently testing storing 250 000 combinations with 100 parallel threads..each thread storing one combination...real case i need to support of tens of millions but that would need different hardware and multi node cluster...
In Storing ONE combination takes around 2sec and involves:
100 parallel threads parallel storing 100 combinations.
I had found behaviour of WRITE TIMEOUTS random some time it works till 200 000 then throw timeouts AND sometimes do not work even for 10k combinations. RANDOM BEHAVIOUR.
I found that during some cassandra-stress read operations, if i set the rate threads too high i will get that CL error. Consider to lower during your test the number of threads to something affordable for your pool to sustain in order to beat the
In my opinion modifying that in cassandra.yaml is not always a good idea. Consider the hardware resources your machines work with.
for egg :
cassandra-stress read n=100000 cl=ONE -rate threads=200 -node N1
will give me the error, while
cassandra-stress read n=100000 cl=ONE -rate threads=121 -node N1
will do smoothly the job.
Hope it can help you up guys.
P.S. when you do read tests try to spread the reads even on the data with the '-pop dist=UNIFORM(1..1000000)' or how much you want.