Cassandra query making - Cannot execute this query as it might involve data filtering and thus may have unpredictable performance

Question 1

Cassandra query making - Cannot execute this query as it might involve data filtering and thus may have unpredictable performance

python cassandra cassandra-cli cqlengine

PythonEnthusiast · Mar 28, 2015 · Viewed 10k times · Source

Answer

Answer

To understand this, you need to understand how cassandra stores data. The first key in the primary key is called the partition key. It defines the partition the row belongs to. All rows in a partition are stored together, and replicated together. Inside a partition, rows are stored according to the clustering keys. These are the columns in the PK that's not the partition key. So, if your PK is (a, b, c, d), a defines the partition. And in a particular partition (say, a = a1), the rows are stored sorted by b. And for each b, the rows are stored sorted by c...and so on. When querying, you hit one (or a few partitions), and then need to specify every successive clustering key up until the key you're looking for. These have to exact equalities except for the last clustering column specified in your query, which may be a range query.

In the previous example, you could thus do

where a = a1 and b > b1 where a = a1 and b=b1 and c>c1 where a = a1 and b=b1 and c=c1 and d > d1

but can't do this:

where a=a1 and c=c1

To do that, you'd need "allow filtering" (realistically, you should look at changing your model, or denormalizing at that point).

Now, on to your question about making every column part of the PK. You could do that, but remember, all writes in Cassandra are upserts. Rows are identified by their primary key. If you make every column part of the PK, you'll not be able to edit a row. You're not allowed to update the value of any column that's in the primary key.

Question 2

I'd the following Cassandra Model:-

class Automobile(Model):
    manufacturer = columns.Text(primary_key=True)
    year = columns.Integer(index=True)
    model = columns.Text(index=True)
    price = columns.Decimal(index=True)

I needed the following queries:-

q = Automobile.objects.filter(manufacturer='Tesla')
q = Automobile.objects.filter(year='something')
q = Automobile.objects.filter(model='something')
q = Automobile.objects.filter(price='something')

These all were working fine, until i wanted multiple column filtering, ie when I tried

q = Automobile.objects.filter(manufacturer='Tesla',year='2013')

it throws an error saying Cannot execute this query as it might involve data filtering and thus may have unpredictable performance.

I rewrote the query with allowed_filtering, but this is not an optimal solution.

Then upon reading more, I edited my model as follow:-

class Automobile(Model):
    manufacturer = columns.Text(primary_key=True)
    year = columns.Integer(primary_key=True)
    model = columns.Text(primary_key=True)
    price = columns.Decimal()

With this I was able to filter multiple coulms as well, without any warning.

When I did DESCRIBE TABLE automobile, it shows this creates composite key PRIMARY KEY ((manufacturer), year, model).

So, my question is what if I declare every attribute as primary key? Is there any problem with this, since I'll be able to filter multiple columns as well.

This is just a small model. What if I had a model such as:-

class UserProfile(Model):
    id = columns.UUID(primary_key=True, default=uuid.uuid4)
    model = columns.Text()
    msisdn = columns.Text(index=True)
    gender = columns.Text(index=True)
    imei1 = columns.Set(columns.Text)
    circle = columns.Text(index=True)
    epoch = columns.DateTime(index=True)
    cellid = columns.Text(index=True)
    lacid = columns.Text(index=True)
    mcc = columns.Text(index=True)
    mnc = columns.Text(index=True)
    installed_apps = columns.Set(columns.Text)
    otp = columns.Text(index=True)
    regtype = columns.Text(index=True)
    ctype = columns.Text(index=True)
    operator = columns.Text(index=True)
    dob = columns.DateTime(index=True)
    jsonver = columns.Text(index=True)

and if I declare every attribute as PK, is there any problem with this?

Cassandra query making - Cannot execute this query as it might involve data filtering and thus may have unpredictable performance

Answer

Related questions