SQL Server heap v.s. clustered index

George2 picture George2 · Aug 27, 2009 · Viewed 27.5k times · Source

I am using SQL Server 2008. I know if a table has no clustered index, then it is called heap, or else the storage model is called clustered index (B-Tree).

I want to learn more about what exactly means heap storage, what it looks like and whether it is organized as "heap" data structure (e.g. minimal heap, maximum heap). Any recommended readings? I want to more a bit more internals, but not too deep. :-)

thanks in advance, George

Answer

Quassnoi picture Quassnoi · Aug 27, 2009

Heap storage has nothing to do with these heaps.

Heap just means records themselves are not ordered (i. e. not linked to one another).

When you insert a record, it just gets inserted into the free space the database finds.

Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

If you create a secondary index on a HEAP table, the RID (a kind of a physical pointer to the storage space) is used as a row pointer.

Clustered index means that the records are part of a B-Tree. When you insert a record, the B-Tree needs to be relinked.

Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

If you create a secondary index on a clustered table, the value of the clustered index key is used as a row pointer.

This means a clustered index should be unique. If a clustered index is not unique, a special hidden column called uniquifier is appended to the index key that makes if unique (and larger in size).

It is also worth noting that creating a secondary index on a column makes the values or the clustered index's key to be the part of the secondayry index's key.

By creating an index on a clustered table, you in fact always get a composite index

CREATE UNIQUE CLUSTERED INDEX CX_mytable_1234 (col1, col2, col3, col4)

CREATE INDEX IX_mytable_5678 (col5, col6, col7, col8)

Index IX_mytable_5678 is in fact an index on the following columns:

col5
col6
col7
col8
col1
col2
col3
col4

This has one more side effect:

A DESC condition in a single-column index on a clustered table makes sense in SQL Server

This index:

CREATE INDEX IX_mytable ON mytable (col1)

can be used in a query like this:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id

, while this one:

CREATE INDEX IX_mytable ON mytable (col1 DESC)

can be used in a query like this:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id DESC