Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

Abhinandan Satpute picture Abhinandan Satpute · Sep 1, 2015 · Viewed 20k times · Source

I am doing read and update queries on a table having 500000 rows and some times getting below error after processing around 300000 rows, even when no node is down.

Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

Infrastructure details:
Having 5 Cassandra nodes, 5 spark and 3 Hadoop nodes each with 8 cores and 28 GB memory and Cassandra replication factor is 3.

Cassandra 2.1.8.621 | DSE 4.7.1 | Spark 1.2.1 | Hadoop 2.7.1.

Cassandra configuration:

read_request_timeout_in_ms (ms): 10000
range_request_timeout_in_ms (ms): 10000
write_request_timeout_in_ms (ms): 5000
cas_contention_timeout_in_ms (ms): 1000 
truncate_request_timeout_in_ms (ms): 60000
request_timeout_in_ms (ms): 10000.

I have tried the same job by increasing read_request_timeout_in_ms (ms) to 20,000 as well but it didn't help.

I am doing queries on two tables. Below is the create statement for one of the tables:

Create Table:

CREATE TABLE section_ks.testproblem_section (
    problem_uuid text PRIMARY KEY,
    documentation_date timestamp,
    mapped_code_system text,
    mapped_problem_code text,
    mapped_problem_text text,
    mapped_problem_type_code text,
    mapped_problem_type_text text,
    negation_ind text,
    patient_id text,
    practice_uid text,
    problem_category text,
    problem_code text,
    problem_comment text,
    problem_health_status_code text,
    problem_health_status_text text,
    problem_onset_date timestamp,
    problem_resolution_date timestamp,
    problem_status_code text,
    problem_status_text text,
    problem_text text,
    problem_type_code text,
    problem_type_text text,
    target_site_code text,
    target_site_text text
    ) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 
    'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 
    'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Queries :

1) SELECT encounter_uuid, encounter_start_date FROM section_ks.encounters WHERE patient_id = '1234' AND encounter_start_date >= '" + formatted_documentation_date + "' ALLOW FILTERING;

2) UPDATE section_ks.encounters SET testproblem_uuid_set = testproblem_uuid_set + {'1256'} WHERE encounter_uuid = 'abcd345';

Answer

Jim Meyer picture Jim Meyer · Sep 1, 2015

Usually when you get a timeout error it means you are trying to do something that isn't scaling well in Cassandra. The fix is often to modify your schema.

I suggest you monitor the nodes while running your query to see if you can spot the problem area. For example, you can run "watch -n 1 nodetool tpstats" to see if any queues are backing up or dropping items. See other monitoring suggestions here.

One thing that might be off in your configuration is that you say you have five Cassandra nodes, but only 3 spark workers (or are you saying you have three spark workers on each Cassandra node?) You'll want at least one spark worker on each Cassandra node so that loading data into spark is done locally on each node and not over the network.

It's hard to tell much more than that without seeing your schema and the query you are running. Are you reading from a single partition? I started getting timeout errors in the vicinity of 300,000 rows when reading from a single partition. See question here. The only workaround I have found so far is to use a client side hash in my partition key to break the partitions up into smaller chunks of around 100K rows. So far I have not found a way to tell Cassandra to not timeout for a query that I expect to take a long time.