How does a Recursive CTE run, line by line?

justin w picture justin w · Jul 6, 2010 · Viewed 15.3k times · Source

I think I've got the format of Recursive CTEs down well enough to write one, but still find myself frustrated to no end that I cannot manually process one (pretend to be the SQL engine myself and reach the result set with pen and paper). I've found this, which is close to what I'm looking for, but not detailed enough. I have no problem tracing through a C++ recursive function and understanding how it runs -- but for SQL I don't understand why or how the engine knows to stop. Does the anchor and recursive block get called everytime, or is the anchor skipped in later iterations? (I doubt it but I'm trying to expression my confusion about the way it seems to jump around.) If the anchor is called each time, how does the anchor not appear multiple times in the final result? I hope someone can just do a break down line 1 line 2, etc. what happens and what is "in memory" as the result set accumulates.

I've taken the liberty of stealing my example from this page, since it seems to be the easiest to understand.

DECLARE @tbl TABLE ( 
      Id INT 
    , [Name] VARCHAR(20) 
    , ParentId INT 
) 

INSERT INTO @tbl( Id, Name, ParentId ) 
VALUES 
     (1, 'Europe', NULL) 
    ,(2, 'Asia',   NULL) 
    ,(3, 'Germany',   1) 
    ,(4, 'UK',        1) 
    ,(5, 'China',     2) 
    ,(6, 'India',     2) 
    ,(7, 'Scotland',  4) 
    ,(8, 'Edinburgh', 7) 
    ,(9, 'Leith',     8)  
; 

WITH abcd 
    AS ( 
        -- anchor 
        SELECT  id, Name, ParentID, 
                CAST(Name AS VARCHAR(1000)) AS Path 
        FROM    @tbl 
        WHERE   ParentId IS NULL 
        UNION ALL 
          --recursive member 
        SELECT  t.id, t.Name, t.ParentID, 
                CAST((a.path + '/' + t.Name) AS VARCHAR(1000)) AS "Path"
        FROM    @tbl AS t 
                JOIN abcd AS a 
                  ON t.ParentId = a.id 
       )
SELECT * FROM abcd 

Answer

Quassnoi picture Quassnoi · Jul 6, 2010

Think of a recursive CTE as of an endless UNION ALL:

WITH    rows AS
        (
        SELECT  *
        FROM    mytable
        WHERE   anchor_condition
        ),
        rows2 AS
        (
        SELECT  *
        FROM    set_operation(mytable, rows)
        ),
        rows3 AS
        (
        SELECT  *
        FROM    set_operation(mytable, rows2)
        ),
        …
SELECT  *
FROM    rows
UNION ALL
SELECT  *
FROM    rows2
UNION ALL
SELECT  *
FROM    rows3
UNION ALL
…

In your case, that would be:

WITH    abcd1 AS
        ( 
        SELECT  *
        FROM    @tbl t
        WHERE   ParentId IS NULL 
        ),
        abcd2 AS
        ( 
        SELECT  t.*
        FROM    abcd1
        JOIN    @tbl t
        ON      t.ParentID = abcd1.id
        ),
        abcd3 AS
        ( 
        SELECT  t.*
        FROM    abcd2
        JOIN    @tbl t
        ON      t.ParentID = abcd2.id
        ),
        abcd4 AS
        ( 
        SELECT  t.*
        FROM    abcd3
        JOIN    @tbl t
        ON      t.ParentID = abcd3.id
        ),
        abcd5 AS
        ( 
        SELECT  t.*
        FROM    abcd4
        JOIN    @tbl t
        ON      t.ParentID = abcd4.id
        ),
        abcd6 AS
        ( 
        SELECT  t.*
        FROM    abcd5
        JOIN    @tbl t
        ON      t.ParentID = abcd5.id
        )
SELECT  *
FROM    abcd1
UNION ALL
SELECT  *
FROM    abcd2
UNION ALL
SELECT  *
FROM    abcd3
UNION ALL
SELECT  *
FROM    abcd4
UNION ALL
SELECT  *
FROM    abcd5
UNION ALL
SELECT  *
FROM    abcd6

Since abcd6 yields no results, this implies a stopping condition.

Theoretically, a recursive CTE can be infinite, but practically, SQL Server tries to forbid the queries that would lead to infinite recordsets.

You may want to read this article: