Cassandra NOT EQUAL Operator

mapreduce cassandra cql3

Babu James · Feb 21, 2014 · Viewed 14.9k times · Source

Question to all Cassandra experts out there.

I have a column family with about a million records.

I would like to query these records in such a way that I should be able to perform a Not-Equal-To kind of operation.

I Googled on this and it seems I have to use some sort of Map-Reduce.

Can somebody tell me what are the options available in this regard.

Answer

I can suggest a few approaches.

1) If you have a limited number of values that you would like to test for not-equality, consider modeling those as a boolean columns (i.e.: column isEqualToUnitedStates with true or false).

2) Otherwise, consider emulating the unsupported query != X by combining results of two separate queries, < X and > X on the client-side.

3) If your schema cannot support either type of query above, you may have to resort to writing custom routines that will do client-side filtering and construct the not-equal set dynamically. This will work if you can first narrow down your search space to manageable proportions, such that it's relatively cheap to run the query without the not-equal.

So let's say you're interested in all purchases of a particular customer of every product type except Widget. An ideal query could look something like SELECT * FROM purchases WHERE customer = 'Bob' AND item != 'Widget'; Now of course, you cannot run this, but in this case you should be able to run SELECT * FROM purchases WHERE customer = 'Bob' without wasting too many resources and filter item != 'Widget' in the client application.

4) Finally, if there is no way to restrict the data in a meaningful way before doing the scan (querying without the equality check would returning too many rows to handle comfortably), you may have to resort to MapReduce. This means running a distributed job that would scan all rows in the table across the cluster. Such jobs will obviously run a lot slower than native queries, and are quite complex to set up. If you want to go this way, please look into Cassandra Hadoop integration.

Cassandra NOT EQUAL Operator

Answer

Related questions