Should I use an inline varchar(max) column or store it in a separate table?

Peter Bridger picture Peter Bridger · Nov 9, 2009 · Viewed 7.4k times · Source

I want to create a table in MS SQL Server 2005 to record details of certain system operations. As you can see from the table design below, every column apart from Details is is non nullable.

CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[Details] [varchar](max) NULL
)

Because the Details column won't always have data in it. Is it more efficient to store this column in a separate table and provide a link to it instead?

CREATE TABLE [Log]
(
[LogID] [int] IDENTITY(1,1) NOT NULL,
[ActionID] [int] NOT NULL,
[SystemID] [int] NOT NULL,
[UserID] [int] NOT NULL,
[LoggedOn] [datetime] NOT NULL,
[DetailID] [int] NULL
)       

CREATE TABLE [Detail]
(
[DetailID] [int] IDENTITY(1,1) NOT NULL,
[Details] [varchar](max) NOT NULL
)

For a smaller data type I wouldn't really consider it, but for a varchar(max) does doing this help keep the table size smaller? Or I am just trying to out smart the database and achieving nothing?

Answer

Remus Rusanu picture Remus Rusanu · Nov 9, 2009

Keep it inline. Under the covers SQL Server already stores the MAX columns in a separate 'allocation unit' since SQL 2005. See Table and Index Organization. This in effect is exactly the same as keeping the MAX column in its own table, but w/o any disadvantage of explicitly doing so.

Having an explicit table would actually be both slower (because of the foreign key constraint) and consume more space (because of the DetaiID duplication). Not to mention that it requires more code, and bugs are introduced by... writing code.

alt text http://i.msdn.microsoft.com/ms189051.3be61595-d405-4b30-9794-755842d7db7e(en-us,SQL.100).gif

Update

To check the actual location of data, a simple test can show it:

use tempdb;
go

create table a (
  id int identity(1,1) not null primary key,
  v_a varchar(8000),
  nv_a nvarchar(4000),
  m_a varchar(max),
  nm_a nvarchar(max),
  t text,
  nt ntext);
go

insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('v_a', N'nv_a', 'm_a', N'nm_a', 't', N'nt');
go

select %%physloc%%,* from a
go

The %%physloc%% pseudo column will show the actual physical location of the row, in my case it was page 200:

dbcc traceon(3604)
dbcc page(2,1, 200, 3)

Slot 0 Column 2 Offset 0x19 Length 3 Length (physical) 3
v_a = v_a                            
Slot 0 Column 3 Offset 0x1c Length 8 Length (physical) 8
nv_a = nv_a                          
m_a = [BLOB Inline Data] Slot 0 Column 4 Offset 0x24 Length 3 Length (physical) 3
m_a = 0x6d5f61                       
nm_a = [BLOB Inline Data] Slot 0 Column 5 Offset 0x27 Length 8 Length (physical) 8
nm_a = 0x6e006d005f006100            
t = [Textpointer] Slot 0 Column 6 Offset 0x2f Length 16 Length (physical) 16
TextTimeStamp = 131137536            RowId = (1:182:0)                    
nt = [Textpointer] Slot 0 Column 7 Offset 0x3f Length 16 Length (physical) 16
TextTimeStamp = 131203072            RowId = (1:182:1)   

All column values but the TEXT and NTEXT were stored inline, including the MAX types.
After changing the table options and insert a new row (sp_tableoption does not affect existing rows), the MAX types were evicted into their own storage:

sp_tableoption 'a' , 'large value types out of row', '1';
insert into a (v_a, nv_a, m_a, nm_a, t, nt)
values ('2v_a', N'2nv_a', '2m_a', N'2nm_a', '2t', N'2nt');    
dbcc page(2,1, 200, 3);

Note how m_a and nm_a columns are now a Textpointer into the LOB allocation unit:

Slot 1 Column 2 Offset 0x19 Length 4 Length (physical) 4
v_a = 2v_a                           
Slot 1 Column 3 Offset 0x1d Length 10 Length (physical) 10
nv_a = 2nv_a                         
m_a = [Textpointer] Slot 1 Column 4 Offset 0x27 Length 16 Length (physical) 16
TextTimeStamp = 131268608            RowId = (1:182:2)                    
nm_a = [Textpointer] Slot 1 Column 5 Offset 0x37 Length 16 Length (physical) 16
TextTimeStamp = 131334144            RowId = (1:182:3)                    
t = [Textpointer] Slot 1 Column 6 Offset 0x47 Length 16 Length (physical) 16
TextTimeStamp = 131399680            RowId = (1:182:4)                    
nt = [Textpointer] Slot 1 Column 7 Offset 0x57 Length 16 Length (physical) 16
TextTimeStamp = 131465216            RowId = (1:182:5)                    

For completion sakeness we can also force the one of the non-max fields out of row:

update a set v_a = replicate('X', 8000);
dbcc page(2,1, 200, 3);

Note how the v_a column is stored in the Row-Overflow storage:

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
v_a = [BLOB Inline Root] Slot 0 Column 2 Offset 0x19 Length 24 Length (physical) 24
Level = 0                            Unused = 99                          UpdateSeq = 1
TimeStamp = 1098383360               
Link 0
Size = 8000                          RowId = (1:176:0) 

So, as other have already commented, the MAX types are stored inline by default, if they fit. For many DW projects this would be unnacceptable because the typical DW loads must scan or at least range scan, so the sp_tableoption ..., 'large value types out of row', '1' should be used. Note that this does not affect existing rows, in my test not even on index rebuild, so the option has to be turned on early.

For most OLTP type loads though the fact that MAX types are stored inline if possible is actually an advantage, since the OLTP access pattern is to seek and the row width makes little impact on it.

None the less, regarding the original question: separate table is not necessary. Turning on the large value types out of row option achieves the same result at a free cost for development/test.