I am new to NoSQL and Cassandra. I am experimenting with settings to acheive an in memory cache only solution. I am processing by reading line by line from a 100000 lines file and using Hector to insert to Cassandra. I am noticing a very low throughput of around 6000 inserts per second. The whole write operation about 20.5 seconds which is unacceptable to our application. We need something like 100000 inserts per second. I am testing on a Windows 7 computer with 4GB RAM.
I am doing an insert only test.
Kindly let me know where I am going wrong. Kindly suggest on how I can improve the inserts per second.
Keyspace: Keyspace1
Read Count: 0
Read Latency: NaN ms.
Write Count: 177042
Write Latency: 0.003106884242157228 ms.
Pending Tasks: 0
Column Family: user
SSTable count: 3
Space used (live): 17691
Space used (total): 17691
Number of Keys (estimate): 384
Memtable Columns Count: 100000
Memtable Data Size: 96082090
Memtable Switch Count: 1
Read Count: 0
Read Latency: NaN ms.
Write Count: 177042
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 150000
Key cache size: 0
Key cache hit rate: NaN
Row cache capacity: 150000
Row cache size: 0
Row cache hit rate: NaN
Compacted row minimum size: 73
Compacted row maximum size: 924
Compacted row mean size: 784
I have tried couple of methods for setting row cache and key cache:
Through Cassandra CLI
Through NodeCmd: java org.apache.cassandra.tools.NodeCmd -p 7199 setcachecapacity Keyspace1 user 150000 150000
I wouldn't describe 6000 writes per second as "slow" - but Cassandra can do much better. But note that Cassandra is designed for durable writes, so may give lower performance than memory-only caching solutions.
As sbridges says, you cannot get full performance out of Cassandra using a single client. Try using multiple client threads, or processes, or machines.
I don't think you will get 100,000 writes per second on a single node. I have only obtained around 20,000-25,000 writes per second on modest hardware (although Cassandra has got significantly faster since I did that benchmarking). 6000 per second seems about right for a single client against a single commodity node.
With a cluster of nodes, you can definitely get 100,000 per second (See http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html for a recent benchmark of 1,000,000 writes per second!)
Row cache and key cache are to help read performance, not write performance.
Also, make sure you are batching the writes (if appropriate) - this will reduce the network overhead.