I have read that joins are better than subqueries.
But
EXPLAIN QUERY PLAN
SELECT Queue.Id, NULL
FROM Queue
INNER JOIN LastQueue
ON Queue.Id=LastQueue.Id
gives
Array
(
[0] => Array
(
[selectid] => 0
[order] => 0
[from] => 0
[detail] => SCAN TABLE Queue (~1000000 rows)
)
[1] => Array
(
[selectid] => 0
[order] => 1
[from] => 1
[detail] => SEARCH TABLE LastQueue USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
)
)
while
EXPLAIN QUERY PLAN
SELECT Queue.Id, NULL
FROM Queue
WHERE (SELECT 1 FROM LastQueue WHERE Queue.Id=LastQueue.Id) IS NOT NULL
gives
Array
(
[0] => Array
(
[selectid] => 0
[order] => 0
[from] => 0
[detail] => SCAN TABLE Queue (~500000 rows)
)
[1] => Array
(
[selectid] => 0
[order] => 0
[from] => 0
[detail] => EXECUTE CORRELATED SCALAR SUBQUERY 1
)
[2] => Array
(
[selectid] => 1
[order] => 0
[from] => 0
[detail] => SEARCH TABLE LastQueue USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
)
)
I think I have to multiply the number of rows to get an idea of cost. I am right?
Then,
Then, is the subquery is faster than join?
Is there a good tutorial to learn how to understand EXPLAIN
/EXPLAIN QUERY PLAN
results?
And why does it say that SCAN TABLE Queue is ~1000000 and ~500000, when that table has 76 rows?
The line EXECUTE CORRELATED SCALAR SUBQUERY 1
is just there because you're using a different query syntax.
The actual execution is the same in both cases: SQLite goes through all records of the Queue
table, and tries to look up the corresponding record in the LastQueue
table.
That the estimated record counts are different is of no concern to you, because you know that the actual number of records is the same.
Joins might be better than subqueries in other databases, but in SQLite, which uses only nested loop joins, the only difference is that a join allows SQLite to choose the outer and the inner table in the join.
In any case, you should write the query in the most simple and maintainable way, and optimize it only if you have measured that you get a noticeable and necessary improvement.
Please note that instead of:
WHERE (SELECT ...) IS NOT NULL
it would be more idiomatic to write:
WHERE EXISTS (SELECT ...)