How to understand SQLite `EXPLAIN QUERY PLAN` result?

Oriol picture Oriol · Jul 15, 2013 · Viewed 9.1k times · Source

I have read that joins are better than subqueries.

But

EXPLAIN QUERY PLAN
SELECT Queue.Id, NULL
    FROM Queue
    INNER JOIN LastQueue
    ON Queue.Id=LastQueue.Id

gives

Array
(
    [0] => Array
        (
            [selectid] => 0
            [order] => 0
            [from] => 0
            [detail] => SCAN TABLE Queue (~1000000 rows)
        )

    [1] => Array
        (
            [selectid] => 0
            [order] => 1
            [from] => 1
            [detail] => SEARCH TABLE LastQueue USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        )

)

while

EXPLAIN QUERY PLAN
SELECT Queue.Id, NULL
    FROM Queue
    WHERE (SELECT 1 FROM LastQueue WHERE Queue.Id=LastQueue.Id) IS NOT NULL

gives

Array
(
    [0] => Array
        (
            [selectid] => 0
            [order] => 0
            [from] => 0
            [detail] => SCAN TABLE Queue (~500000 rows)
        )

    [1] => Array
        (
            [selectid] => 0
            [order] => 0
            [from] => 0
            [detail] => EXECUTE CORRELATED SCALAR SUBQUERY 1
        )

    [2] => Array
        (
            [selectid] => 1
            [order] => 0
            [from] => 0
            [detail] => SEARCH TABLE LastQueue USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
        )

)

I think I have to multiply the number of rows to get an idea of cost. I am right?

Then,

  • Using join: 1000000*1
  • Using subquery: 500000*1*1

Then, is the subquery is faster than join?

Is there a good tutorial to learn how to understand EXPLAIN/EXPLAIN QUERY PLAN results?

And why does it say that SCAN TABLE Queue is ~1000000 and ~500000, when that table has 76 rows?

Answer

CL. picture CL. · Jul 16, 2013

The line EXECUTE CORRELATED SCALAR SUBQUERY 1 is just there because you're using a different query syntax.

The actual execution is the same in both cases: SQLite goes through all records of the Queue table, and tries to look up the corresponding record in the LastQueue table.

That the estimated record counts are different is of no concern to you, because you know that the actual number of records is the same.

Joins might be better than subqueries in other databases, but in SQLite, which uses only nested loop joins, the only difference is that a join allows SQLite to choose the outer and the inner table in the join.

In any case, you should write the query in the most simple and maintainable way, and optimize it only if you have measured that you get a noticeable and necessary improvement.


Please note that instead of:

WHERE (SELECT ...) IS NOT NULL

it would be more idiomatic to write:

WHERE EXISTS (SELECT ...)