Why is there a HUGE performance difference between temp table and subselect

Ward picture Ward · May 27, 2013 · Viewed 66.9k times · Source

This is a question about SQL Server 2008 R2

I'm not a DBA, by far. I'm a java developer, who has to write SQL from time to time. (mostly embedded in code). I want to know if I did something wrong here, and if so, what I can do to avoid it to happen again.

Q1:

SELECT something FROM (SELECT * FROM T1 WHERE condition1) JOIN ...

Q1 features 14 joins

Q2 is the same as Q1, with one exception. (SELECT * FROM T1 WHERE condition1) is executed before, and stored in a temp table.

This is not a correlated sub-query.

Q2:

SELECT * INTO #tempTable FROM T1 WHERE condition1
SELECT something FROM #tempTable  JOIN ...

again, 14 joins.

The thing that puzzles me now is that Q1 took > 2min, (tried it a few times, to avoid caching to play a role) while Q2 (both queries combined) took 2sec!!! What gives?

Answer

Karthik AMR picture Karthik AMR · May 27, 2013

Why it's not recommended to use subqueries?

Database Optimizer (regardless of what database you are using) can not always properly optimize such query (with subqueries). In this case, the problem to the optimizer is to choose the right way to join result sets. There are several algorithms for joining two result sets. The choice of the algorithm depends on the number of records which are contained in one and in the other result set. In case if you join two physical tables (subquery is not a physical table), the database can easily determine the amount of data in two result sets by the available statistics. If one of result sets is a subquery then to understand how many records it returns is very difficult. In this case the database can choose wrong query plan of join, so that will lead to a dramatic reduction in the performance of the query.

Rewriting the query with using temporary tables is intended to simplify the database optimizer. In the rewritten query all result sets participating in joins will be physical tables and the database will easily determine the length of each result set. This will allow the database to choose the guaranteed fastest of all possible query plans. Moreover, the database will make the right choice no matter what are the conditions. The rewritten query with temporary tables would work well on any database, this is especially important in the development of portable solutions. In addition, the rewritten query is easier to read, easier to understand and to debug.

It is understood that rewriting the query with temporary tables can lead to some slowdown due to additional expenses: creation of temporary tables. If the database will not be mistaken with the choice of the query plan, it will perform the old query faster than a new one. However, this slowdown will always be negligible. Typically the creation of a temporary table takes a few milliseconds. That is, the delay can not have a significant impact on system performance, and usually can be ignored.

Important! Do not forget to create indexes for temporary tables. The index fields should include all fields that are used in join conditions.