Cassandra timeout cqlsh query large(ish) amount of data

slmyers picture slmyers · Apr 3, 2015 · Viewed 47.8k times · Source

I'm doing a student project involving building and querying a Cassandra data cluster.

When my cluster load was light ( around 30GB ) my queries ran without a problem, but now that it's quite a bit bigger (1/2TB) my queries are timing out.

I thought that this problem might arise, so before I began generating and loading test data I had changed this value in my cassandra.yaml file:

request_timeout_in_ms (Default: 10000 ) The default timeout for other, miscellaneous operations.

However, when I changed that value to like 1000000, then cassandra seemingly hung on startup -- but that could've just been the large timeout at work.

My goal for data generation is 2TB. How do I query that large of space without running into timeouts?

queries :

SELECT  huntpilotdn 
FROM    project.t1 
WHERE   (currentroutingreason, orignodeid, origspan,  
        origvideocap_bandwidth, datetimeorigination)
        > (1,1,1,1,1)
AND      (currentroutingreason, orignodeid, origspan,    
         origvideocap_bandwidth, datetimeorigination)
         < (1000,1000,1000,1000,1000)
LIMIT 10000
ALLOW FILTERING;

SELECT  destcause_location, destipaddr
FROM    project.t2
WHERE   datetimeorigination = 110
AND     num >= 11612484378506
AND     num <= 45880092667983
LIMIT 10000;


SELECT  origdevicename, duration
FROM    project.t3
WHERE   destdevicename IN ('a','f', 'g')
LIMIT 10000
ALLOW FILTERING;

I have a demo keyspace with the same schemas, but a far smaller data size (~10GB) and these queries run just fine in that keyspace.

All these tables that are queried have millions of rows and around 30 columns in each row.

Answer

gcarvelli picture gcarvelli · Oct 15, 2016

If you are using Datastax cqlsh then you can specify client timeout seconds as a command line argument. The default is 10.

$ cqlsh --request-timeout=3600

Datastax Documentation