Which are more performant, CTE or temporary tables?

Blankman picture Blankman · Mar 27, 2009 · Viewed 166.7k times · Source

Which are more performant, CTE or Temporary Tables?

Answer

Martin Smith picture Martin Smith · Oct 5, 2014

It depends.

First of all

What is a Common Table Expression?

A (non recursive) CTE is treated very similarly to other constructs that can also be used as inline table expressions in SQL Server. Derived tables, Views, and inline table valued functions. Note that whilst BOL says that a CTE "can be thought of as temporary result set" this is a purely logical description. More often than not it is not materlialized in its own right.

What is a temporary table?

This is a collection of rows stored on data pages in tempdb. The data pages may reside partially or entirely in memory. Additionally the temporary table may be indexed and have column statistics.

Test Data

CREATE TABLE T(A INT IDENTITY PRIMARY KEY, B INT , F CHAR(8000) NULL);

INSERT INTO T(B)
SELECT TOP (1000000)  0 + CAST(NEWID() AS BINARY(4))
FROM master..spt_values v1,
     master..spt_values v2;

Example 1

WITH CTE1 AS
(
SELECT A,
       ABS(B) AS Abs_B,
       F
FROM T
)
SELECT *
FROM CTE1
WHERE A = 780

Plan 1

Notice in the plan above there is no mention of CTE1. It just accesses the base tables directly and is treated the same as

SELECT A,
       ABS(B) AS Abs_B,
       F
FROM   T
WHERE  A = 780 

Rewriting by materializing the CTE into an intermediate temporary table here would be massively counter productive.

Materializing the CTE definition of

SELECT A,
       ABS(B) AS Abs_B,
       F
FROM T

Would involve copying about 8GB of data into a temporary table then there is still the overhead of selecting from it too.

Example 2

WITH CTE2
     AS (SELECT *,
                ROW_NUMBER() OVER (ORDER BY A) AS RN
         FROM   T
         WHERE  B % 100000 = 0)
SELECT *
FROM   CTE2 T1
       CROSS APPLY (SELECT TOP (1) *
                    FROM   CTE2 T2
                    WHERE  T2.A > T1.A
                    ORDER  BY T2.A) CA 

The above example takes about 4 minutes on my machine.

Only 15 rows of the 1,000,000 randomly generated values match the predicate but the expensive table scan happens 16 times to locate these.

enter image description here

This would be a good candidate for materializing the intermediate result. The equivalent temp table rewrite took 25 seconds.

INSERT INTO #T
SELECT *,
       ROW_NUMBER() OVER (ORDER BY A) AS RN
FROM   T
WHERE  B % 100000 = 0

SELECT *
FROM   #T T1
       CROSS APPLY (SELECT TOP (1) *
                    FROM   #T T2
                    WHERE  T2.A > T1.A
                    ORDER  BY T2.A) CA 

With Plan

Intermediate materialisation of part of a query into a temporary table can sometimes be useful even if it is only evaluated once - when it allows the rest of the query to be recompiled taking advantage of statistics on the materialized result. An example of this approach is in the SQL Cat article When To Break Down Complex Queries.

In some circumstances SQL Server will use a spool to cache an intermediate result, e.g. of a CTE, and avoid having to re-evaluate that sub tree. This is discussed in the (migrated) Connect item Provide a hint to force intermediate materialization of CTEs or derived tables. However no statistics are created on this and even if the number of spooled rows was to be hugely different from estimated is not possible for the in progress execution plan to dynamically adapt in response (at least in current versions. Adaptive Query Plans may become possible in the future).