Clustering Order using timeuuid CQL

Shrikar picture Shrikar · Dec 11, 2013 · Viewed 9.7k times · Source

My Usecase

I want to order by timestamp DESC for the results. But I don't want timestamp to be the second column in the primary key as that will take of my querying capability

for example

create table demo(oid int,cid int,ts timeuuid,PRIMARY KEY (oid,cid,ts)) WITH CLUSTERING ORDER BY (ts DESC);

Queries required:

I want the result for all the below queries to be in DESC order of timestamp

select * from demo where oid = 100;
select * from demo where oid = 100 and cid = 10;
select * from demo where oid = 100 and cid = 100 and ts > minTimeuuid('something');

I am trying to create this table with CLUSTERING ORDER IN CQL and getting this error

cqlsh:v> create table demo(oid int,cid int,ts timeuuid,PRIMARY KEY (oid,cid,ts))     WITH CLUSTERING ORDER BY (ts desc);
Bad Request: Missing CLUSTERING ORDER for column cid

In this document it mentions that we can have multple keys for cluster ordering. any one know how to do that?

Go here Datastax doc

Answer

RussS picture RussS · Dec 11, 2013
CREATE TABLE example ( a int, b int, c int, d int, PRIMARY KEY (a,b,c)) WITH CLUSTERING ORDER BY (b DESC , c ASC ) ;

Is the correct syntax for ordering with multiple columns.


For your particular application you actually are trying to get results from distinctly different types of queries. In Cassandra it is best to shape each table to be the response to a particular query.

For example (not knowing that much about your application)

select * from demo where oid = 100 and cid = 100 and ts > minTimeuuid('something');
select * from demo where oid = 100 and cid = 10;

Might be better served by a table structure like

create table demo_oct(oid int,cid int,ts timeuuid, body, other ...., PRIMARY KEY ((oid,cid),ts)) WITH CLUSTERING ORDER BY (ts DESC);

This way each set of time-series for a pair of oid and cid data will reside in it's own partiton and be easy to retrieve. This is because i'm using a Parition key made up of both oid and cid. This is why there is an extra set of parenthesizes in the key. The clustering key ts ensures that the data is in the order that you want.

But as you noticed, you cannot perform select * from table oid == 10 on this table because that would require scanning over the entire database (because of the partition structure)

For the queries like

select * from demo where oid = 100;

you need a second table(again not knowing your particular application)

create table demo_ot(oid int,cid int,ts timeuuid, body, other ...., PRIMARY KEY (oid,ts)) WITH CLUSTERING ORDER BY (ts DESC);

This table would keep time series for each OID in a single partiton allowing for extremely fast slices. Here the partition key is just OID while ts is still the clustering key.

On the application side you would be inserting to both of these tables simultaneously.

More info on Datamodeling