I use python script, that passes to cassandra batch query, like this:
query = 'BEGIN BATCH ' + 'insert into ... ; insert into ... ; insert into ...; ' + ' APPLY BATCH;'
session.execute(query)
It is work some time, but in about 2 minutes after start scripts fails and print:
Traceback (most recent call last):<br>
File "/home/fervid/Desktop/cassandra/scripts/parse_and_save_to_cassandra.cgi", line 127, in <module><br>
session.execute(query)<br>
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 1103, in execute<br>
result = future.result(timeout)<br>
File "/usr/local/lib/python2.7/dist-packages/cassandra/cluster.py", line 2475, in result<br>
raise OperationTimedOut(errors=self._errors, last_host=self._current_host)<br>
cassandra.OperationTimedOut: errors={}, last_host=127.0.0.1<br>
<br>
<br>
I changed timeout from cassandra.yaml
to:
read_request_timeout_in_ms: 15000
range_request_timeout_in_ms: 20000
write_request_timeout_in_ms: 20000
cas_contention_timeout_in_ms: 10000
request_timeout_in_ms: 25000
Then I restarted cassandra, but it didn't help. The error happens again and again!
Lines in log near the time when script failed:
INFO [BatchlogTasks:1] 2014-06-11 14:18:10,490 ColumnFamilyStore.java (line 794) Enqueuing flush of Memtable-batchlog@28149592(13557969/13557969 serialized/live bytes, 4 ops)
INFO [FlushWriter:10] 2014-06-11 14:18:10,490 Memtable.java (line 363) Writing Memtable-batchlog@28149592(13557969/13557969 serialized/live bytes, 4 ops)
INFO [FlushWriter:10] 2014-06-11 14:18:10,566 Memtable.java (line 410) Completed flushing; nothing needed to be retained. Commitlog position was ReplayPosition(segmentId=1402469922169, position=27138996)
INFO [ScheduledTasks:1] 2014-06-11 14:18:13,758 GCInspector.java (line 116) GC for ParNew: 640 ms for 3 collections, 775214160 used; max is 1052770304
INFO [ScheduledTasks:1] 2014-06-11 14:18:16,155 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 1838 ms for 2 collections, 810976000 used; max is 1052770304
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,959 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 1612 ms for 1 collections, 858404088 used; max is 1052770304
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,959 StatusLogger.java (line 55) Pool Name Active
Pending Completed Blocked All Time Blocked
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,959 StatusLogger.java (line 70) ReadStage 0 0 627 0 0
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,960 StatusLogger.java (line 70) RequestResponseStage 0
0 0 0 0
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,960 StatusLogger.java (line 70) ReadRepairStage 0 0 0 0 0
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,960 StatusLogger.java (line 70) MutationStage 0
0 184386 0 0
INFO [ScheduledTasks:1] 2014-06-11 14:18:17,960 StatusLogger.java (line 70) ReplicateOnWriteStage 0 0 0 0 0
This is a client side timeout (see the link in @Syrial's reply: http://datastax.github.io/python-driver/api/cassandra.html#cassandra.OperationTimedOut)
You can change the Session
default timeout:
session = cluster.connect()
session.default_timeout = 30 # this is in *seconds*
You can change the timeout for a particular query:
session.execute(statement, 30, ...)
You could speed up the execution quite a bit by using prepared statements inside the BATCH
. See the Batching statements sections in this post
If you are after even better results, please read these performance notes