Wrong count(*) with cassandra-cql

user934801 picture user934801 · Jan 9, 2012 · Viewed 8.3k times · Source

I tried to create some users for my testing. I created users in a loop from 0..100000 using the cassandra-cql gem for Ruby on Rails, and then I counted the users in my database and there were only 10000 users as result. If I create 9000, everything works fine. First I thought the users didn't exist, but I used the Apollo WebUI for Cassandra, and I could find the user with the id 100000 and users below. Why does this happen?

I know I should use a counter column to provide the number of users in my application, but I want to know if this is a bug or a failure of mine.

def self.create_users
  (0..19000).each do |f|
    @@db.execute("INSERT INTO users (uid, first_name, last_name, email) VALUES (?,?,?,?)", f.to_s, "first_name", "last_name", "email")
  end
end

def self.count_users
  count = @@db.execute("SELECT count(*) FROM users")
  count.fetch do |c|
    return c[0]
  end
end

Answer

psanford picture psanford · Jan 10, 2012

CQL operations limit both the number of rows and the number of columns that will be returned to the user. By default that limit is 10,000. Because the count(*) operation actually has to fetch out all the rows in order to get the count, it is also limited by the default of 10,000 rows. You could increase the limit for the query (although I don't recommend it):

SELECT count(*) FROM users limit 20000;

Note that this is an expensive operation especially when you have a lot of rows. You should anticipate this type of query could take a long time for any medium or large size dataset. If at all possible you should denormalize this count into a counter or some other form that will not require fetching all the rows in your column family.