cassandra primary key column cannot be restricted

Orodan picture Orodan · Feb 17, 2015 · Viewed 21.4k times · Source

I am using Cassandra for the first time in a web app and I got a query problem. Here is my tab :

CREATE TABLE vote (
    doodle_id uuid,
    user_id uuid,
    schedule_id uuid,
    vote int,
    PRIMARY KEY ((doodle_id), user_id, schedule_id)
);

On every request, I indicate my partition key, doodle_id. For example I can make without any problems :

select * from vote where doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 and user_id = 97a7378a-e1bb-4586-ada1-177016405142;

But on the last request I made :

select * from vote where doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 and schedule_id = c37df0ad-f61d-463e-bdcc-a97586bea633;

I got the following error :

Bad Request: PRIMARY KEY column "schedule_id" cannot be restricted (preceding column "user_id" is either not restricted or by a non-EQ relation)

I'm new with Cassandra, but correct me if I'm wrong, in a composite primary key, the first part is the PARTITION KEY which is mandatory to allow Cassandra to know where to look for data. Then the others parts are CLUSTERING KEY to sort data.

But I still don't get why my first request is working and not the second one ?

If anyone could help it will be a great pleasure.

Answer

Aaron picture Aaron · Feb 17, 2015

In Cassandra, you should design your data model to suit your queries. Therefore the proper way to support your second query (queries by doodle_id and schedule_id, but not necessarilly with user_id), is to create a new table to handle that specific query. This table will be pretty much the same, except the PRIMARY KEY will be slightly different:

CREATE TABLE votebydoodleandschedule (
    doodle_id uuid,
    user_id uuid,
    schedule_id uuid,
    vote int,
    PRIMARY KEY ((doodle_id), schedule_id, user_id)
);

Now this query will work:

SELECT * FROM votebydoodleandschedule 
WHERE doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 
AND schedule_id = c37df0ad-f61d-463e-bdcc-a97586bea633;

This gets you around having to specify ALLOW FILTERING. Relying on ALLOW FILTERING is never a good idea, and is certainly not something that you should do in a production cluster.