In IndexedDB, is there a way to make a sorted compound query?

Question 1

In IndexedDB, is there a way to make a sorted compound query?

html sqlite web-applications client-side indexeddb

jason · Aug 23, 2012 · Viewed 16k times · Source

Answer

Answer

The term compound query as used in this answer refers to an SQL SELECT statement involving more than one condition in its WHERE clause. Although such queries are not mentioned in the indexedDB specification, you can approximate the behavior of a compound query by creating an index with a keypath that consists of an array of property names.

This is completely unrelated to using the multi-entry flag when creating an index. The multi-entry flag adjusts how indexedDB creates an index over a single array property. We are indexing an array of object properties, not the values of a single array property of an object.

Creating the index

In this example, 'name', 'gender', and 'age' correspond to property names of student objects stored within the students object store.

// An example student object in the students store
var foo = {
  'name': 'bar',
  'age': 15,
  'gender': 'M'
};

function myOnUpgradeNeeded(event) {
  var db = event.target.result;
  var students = db.createObjectStore('students');
  var name = 'males25';
  var keyPath = ['name', 'gender', 'age'];
  students.createIndex(name, keyPath);
}

Opening a cursor on the index

You can then open a cursor on the index:

var students = transaction.objectStore('students');
var index = students.index('males25');
var lowerBound = ['AAAAA','male',26];
var upperBound = ['ZZZZZ','male',200];
var range = IDBKeyRange.bound(lowerBound, upperBound);
var request = index.openCursor(range);

However, for reasons I am about to explain, this won't always work.

Aside: using a range parameter to openCursor or get is optional. If you do not specify a range, then IDBKeyRange.only is implicitly used for you. In other words, you only need to use IDBKeyRange for bounded cursors.

Fundamental index concepts

Indices are like object stores but are not directly mutable. Instead, you use CRUD (create read update delete) operations on the referenced object store, and then indexedDB automatically cascades updates to the index.

Understanding sorting is fundamental to understanding indices. An index is basically just a specially sorted collection of objects. Technically, it is also filtered, but I'll touch on that in a moment. Generally, when you open a cursor on an index, you are iterating according to the index's order. This order could be, and probably is, different than the order of the objects in the referenced object store. The order is important because this allows iteration to be more efficient, and allows a custom lower and upper bound that only makes sense in the context of an index-specific order.

The objects in the index are sorted at the time changes to the store occur. When you add an object to the store, it is added to the proper position in the index. Sorting boils down to a comparison function, similar to Array.prototype.sort, that compares two items and returns whether one object is less than the other one, greater than the other one, or equal. So we can understand sorting behavior better by diving into more details on comparison functions.

Strings are compared lexicographically

This means, for example, that 'Z' is less than 'a' and that the string '10' is greater than the string '020'.

Values of different types are compared using a specification-defined order

For example, the specification specifies how a string-type value comes before or after a date-type value. It does not matter what the values contain, just the types.

IndexedDB does not coerce types for you. You can shoot yourself in the foot here. You generally never want to be comparing different types.

Objects with undefined properties do not appear in indices whose keypath is comprised of one or more of those properties

As I mentioned, indices may not always include all objects from the referenced object store. When you put an object into an object store, the object will not appear in the index if it has missing values for the properties upon which the index is based. For example, if we have a student where we don't know the age, and we insert this into the students store, the particular student will not appear in the males25 index.

Remember this when you wonder why an object doesn't appear when iterating a cursor on the index.

Also note the subtle difference between null and an empty string. An empty string is not a missing value. An object with an empty string for a property could still appear in an index based on that property, but will not appear in the index if the property is present but undefined or not present. And if it is not in the index, you won't see it when iterating a cursor over the index.

You must specify each property of an array keypath when creating an IDBKeyRange

You must specify a valid value for each property in the array keypath when creating a lower or upper bound to use in a range for when opening a cursor over that range. Otherwise, you will get some type of Javascript error (varies by browser). For example, you cannot create a range such as IDBKeyRange.only([undefined, 'male', 25]) because the name property is undefined.

Confusingly, if you specify the wrong type of value, such as IDBKeyRange.only(['male', 25]), where name is undefined, you won't get an error in the above sense, but you will get nonsensical results.

There is an exception to this general rule: you can compare arrays of different lengths. Therefore, you technically can omit properties from the range, provided that you do so from the end of the array, and that you appropriately truncate the array. For example, you could use IDBKeyRange.only(['josh','male']).

Short-circuited array sorting

The indexedDB specification provides an explicit method for sorting arrays:

Values of type Array are compared to other values of type Array as follows:

Let A be the first Array value and B be the second Array value.
Let length be the lesser of A's length and B's length.
Let i be 0.
If the ith value of A is less than the ith value of B, then A is less than B. Skip the remaining steps.
If the ith value of A is greater than the ith value of B, then A is greater than B. Skip the remaining steps.
Increase i by 1.
If i is not equal to length, go back to step 4. Otherwise continue to next step.
If A's length is less than B's length, then A is less than B. If A's length is greater than B's length, then A is greater than B. Otherwise A and B are equal.

The catch is in steps 4 and 5: Skip the remaining steps. What this basically means is that if we are comparing two arrays for order, such as [1,'Z'] and [0,'A'], the method only considers the first element because at that point 1 is > 0. It never gets around to checking Z vs A because of short-circuited evaluation (steps 4 and 5 in the spec).

So, the earlier example is not going to work. It actually works more like the following:

WHERE (students.name >= 'AAAAA' && students.name <= 'ZZZZZ') || 
(students.name >= 'AAAAA' && students.name <= 'ZZZZZ' && 
students.gender >= 'male' && students.gender <= 'male') || 
(students.name >= 'AAAAA' && students.name <= 'ZZZZZ' && 
students.gender >= 'male' && students.gender <= 'male' && 
students.age >= 26 && students.age <= 200)

If you have any experience with such Boolean clauses in SQL or in general programming, then you already should recognize how the full set of conditions are not necessarily involved. That means you will not get the list of objects you want, and this is why you cannot truly get the same behavior as SQL compound queries.

Dealing with short-circuiting

You cannot easily avoid this short-circuiting behavior in the current implementation. In the worst case you have to load all objects from the store/index into memory and then sort the collection using your own custom sorting function.

There are ways to minimize or avoid some of the short-circuiting issues:

For example, if you are using index.get(array) or index.openCursor(array), then there is no short-circuiting concern. There is either an entire match or not an entire match. In this case, the comparison function is only evaluating whether two values are the same, not whether one is greater than or less than the other.

Other techniques to consider:

Rearrange the elements of the keypath from narrowest to widest. Basically provide early clamps on ranges that cut off some of the unwanted results of short-circuiting.
Store a wrapped object in a store that uses specially customized properties so that it can be sorted using a non-array keypath (a non-compound index), or, can make use of a compound index that is not affected by the short-circuiting behavior.
Use multiple indices. This leads to the exploding index problem. Note this link is about another no-sql database, but the same concepts and explanation applies to indexedDB, and the link is a reasonable (and lengthy and complicated) explanation so I am not repeating it here.
One of the creators of indexedDB (the spec, and the Chrome implementation) recently suggested using cursor.continue: https://gist.github.com/inexorabletash/704e9688f99ac12dd336

Testing with indexedDB.cmp

The cmp function provides a quick and simple way to examine how sorting works. For example:

var a = ['Hello',1];
var b = ['World',2];
alert(indexedDB.cmp(a,b));

One nice property of the indexedDB.cmp function is that its signature is the same as the function parameter to Array.prototype.sort. You can easily test values from the console without dealing with connections/schemas/indices and all that. Furthermore, indexedDB.cmp is synchronous, so your test code does not need to involve async callbacks/promises.

Question 2

Say a table has, name, ID, age, sex, education, etc. ID is the key and the table is also indexed for name, age and sex. I need all male students, older than 25, sorted by their names.

This is easy in mySQL:

    SELECT * FROM table WHERE age > 25 AND sex = "M" ORDER BY name

IndexDB allows creation of an index and orders the query based on that index. But it doesn't allow multiple queries like age and sex. I found a small library called queryIndexedDB (https://github.com/philikon/queryIndexedDB) which allows compound queries but doesn't provide sorted results.

So is there a way to make a sorted compound query, while using IndexedDB?

In IndexedDB, is there a way to make a sorted compound query?

Answer

Creating the index

Opening a cursor on the index

Fundamental index concepts

Strings are compared lexicographically

Values of different types are compared using a specification-defined order

Objects with undefined properties do not appear in indices whose keypath is comprised of one or more of those properties

You must specify each property of an array keypath when creating an IDBKeyRange

Short-circuited array sorting

Dealing with short-circuiting

Testing with indexedDB.cmp

Related questions