SELECT in cassandra where id != null

user3780281 picture user3780281 · Jun 27, 2014 · Viewed 32.2k times · Source

How do I query in cassandra for != null columns.

Select * from tableA where id != null;
Select * from tableA where name != null;

Then I wanted to store these values and insert these into different table.

Answer

Aaron picture Aaron · Jun 27, 2014

I don't think this is possible with Cassandra. First of all, Cassandra CQL doesn't support the use of NOT or not equal to operators in the WHERE clause. Secondly, your WHERE clause can only contain primary key columns, and primary key columns will not allow null values to be inserted. I wasn't sure about secondary indexes though, so I ran this quick test:

create table nullTest (id text PRIMARY KEY, name text);
INSERT INTO nullTest (id,name) VALUES ('1','bob');
INSERT INTO nullTest (id,name) VALUES ('2',null);

I now have a table and two rows (one with null data):

SELECT * FROM nullTest;

 id | name
----+------
  2 | null
  1 |  bob

(2 rows)

I then try to create a secondary index on name, which I know contains null values.

CREATE INDEX nullTestIdx ON nullTest(name);

It lets me do it. Now, I'll run a query on that index.

SELECT * FROM nullTest WHERE name=null;
Bad Request: Unsupported null value for indexed column name

And again, this is done under the premise that you can't query for not null, if you can't even query for column values that may actually be null.

So, I'm thinking this can't be done. Also, if null values are a possibility in your primary key, then you may want to re-evaluate your data model. Again, I know the OP's question is about querying where data is not null. But as I mentioned before, Cassandra CQL doesn't have a NOT or != operator, so that's going to be a problem right there.

Another option, is to insert an empty string instead of a null. You would then be able to query on an empty string. But that still doesn't get you past the fundamental design flaw of having a null in a primary key field. Perhaps if you had a composite primary key, and only part of it (the clustering columns) had the possibility of being empty (certainly not part of the partitioning key). But you'd still be stuck with the problem of not being able to query for rows that are "not empty" (instead of not null).

NOTE: Inserting null values was done here for demonstration purposes only. It is something you should do your best to avoid, as inserting a null column value WILL create a tombstone. Likewise, inserting lots of null values will create lots of tombstones.