Inserting null values into cassandra

ArchitGarg picture ArchitGarg · Nov 21, 2016 · Viewed 20.1k times · Source

I have some fields that I am storing into Cassandra, but some of them could be null at any given point. As there are quite a lot of them, it makes the code much more readable if I don’t check each one for null before adding it to the INSERT.

Is there any harm in doing so?

EDIT!!

There is a jira ticket that I found. But I am unable to understand what solution was finally implemented from the ticket. https://issues.apache.org/jira/browse/CASSANDRA-7304

Answer

xmas79 picture xmas79 · Nov 21, 2016

The beautiful thing about Cassandra's new storage engine is the ability to NOT store values. What it means is what it was meant to be: a null value is simply a value that should not be there.

This gives great flexibility, because a null value not explicitly (or implicitly, see later) inserted won't take storage space, nor use processing power and IO bandwidth.

Indeed, it is pretty easy to populate a row with null values:

INSERT INTO mytable (pk, c2, c3) VALUES (0x1234, null, null);

This way you are explicitly telling C* to store a null value in both c2 and c3. However, you could get the same macroscopic effect with:

INSERT INTO mytable (pk) VALUES (0x1234);

Notice that I say macroscopic effect, because when you explicitly insert a null value, C* will insert a tombstone under the hood. In the long run this will bite you, due to how C* perform searches, compactions, etc... so you should avoid whenever possible, the second version will perform much better.

Now, there is also a trap: you can also create tombstones implicitly. This will happen when you use the TTL features builtin in Cassandra.

In conclusion, if you care about yourself I'd suggest to NOT performing any null value inserts. Do a check at application level, you'll save time (and money) later, eg during reads.