Which are more performant, CTE
or Temporary Tables
?
It depends.
First of all
What is a Common Table Expression?
A (non recursive) CTE is treated very similarly to other constructs that can also be used as inline table expressions in SQL Server. Derived tables, Views, and inline table valued functions. Note that whilst BOL says that a CTE "can be thought of as temporary result set" this is a purely logical description. More often than not it is not materlialized in its own right.
What is a temporary table?
This is a collection of rows stored on data pages in tempdb. The data pages may reside partially or entirely in memory. Additionally the temporary table may be indexed and have column statistics.
Test Data
CREATE TABLE T(A INT IDENTITY PRIMARY KEY, B INT , F CHAR(8000) NULL);
INSERT INTO T(B)
SELECT TOP (1000000) 0 + CAST(NEWID() AS BINARY(4))
FROM master..spt_values v1,
master..spt_values v2;
Example 1
WITH CTE1 AS
(
SELECT A,
ABS(B) AS Abs_B,
F
FROM T
)
SELECT *
FROM CTE1
WHERE A = 780
Notice in the plan above there is no mention of CTE1. It just accesses the base tables directly and is treated the same as
SELECT A,
ABS(B) AS Abs_B,
F
FROM T
WHERE A = 780
Rewriting by materializing the CTE into an intermediate temporary table here would be massively counter productive.
Materializing the CTE definition of
SELECT A,
ABS(B) AS Abs_B,
F
FROM T
Would involve copying about 8GB of data into a temporary table then there is still the overhead of selecting from it too.
Example 2
WITH CTE2
AS (SELECT *,
ROW_NUMBER() OVER (ORDER BY A) AS RN
FROM T
WHERE B % 100000 = 0)
SELECT *
FROM CTE2 T1
CROSS APPLY (SELECT TOP (1) *
FROM CTE2 T2
WHERE T2.A > T1.A
ORDER BY T2.A) CA
The above example takes about 4 minutes on my machine.
Only 15 rows of the 1,000,000 randomly generated values match the predicate but the expensive table scan happens 16 times to locate these.
This would be a good candidate for materializing the intermediate result. The equivalent temp table rewrite took 25 seconds.
INSERT INTO #T
SELECT *,
ROW_NUMBER() OVER (ORDER BY A) AS RN
FROM T
WHERE B % 100000 = 0
SELECT *
FROM #T T1
CROSS APPLY (SELECT TOP (1) *
FROM #T T2
WHERE T2.A > T1.A
ORDER BY T2.A) CA
Intermediate materialisation of part of a query into a temporary table can sometimes be useful even if it is only evaluated once - when it allows the rest of the query to be recompiled taking advantage of statistics on the materialized result. An example of this approach is in the SQL Cat article When To Break Down Complex Queries.
In some circumstances SQL Server will use a spool to cache an intermediate result, e.g. of a CTE, and avoid having to re-evaluate that sub tree. This is discussed in the (migrated) Connect item Provide a hint to force intermediate materialization of CTEs or derived tables. However no statistics are created on this and even if the number of spooled rows was to be hugely different from estimated is not possible for the in progress execution plan to dynamically adapt in response (at least in current versions. Adaptive Query Plans may become possible in the future).