I am wondering if there is any disadvantage on defining a column of type nvarchar(max) instead of giving it a (smaller) maximum size.
I read somewhere that if the column value has more than 4?KB the remaining data will be added to an "overflow" area, which is ok.
I'm creating a table where most of the time the text will be of a few lines, but I was wondering if there's any advantage in setting a lower limit and then adding a validation to avoid breaking that limit.
Is there any restriction on the creation of indexes with nvarchar(max) column, or anything that pays for having to add the restriction on the size limit?
Thanks!
Strictly speaking the MAX
types will always be a bit slower than the non-MAX types, see Performance comparison of varchar(max) vs. varchar(N). But this difference is never visible in practice, where it just becomes noise in the overall performance driven by IO.
Your main concern should not be performance of MAX vs. non-MAX. You should be concerned with the question it will be possible that this column will have to store more than 8000 bytes? If the answer is yes, even by if is a very very unlikely yes, then the answer is obvious: use a MAX type, the pain to convert this column later to a MAX type is not worth the minor performance benefit of non-MAX types.
Other concerns (possibility to index that column, unavailability of ONLINE index operations for tables with MAX columns) were already addressed by Denis' answer.
BTW, the information about the columns over 4KB having remaining data in an overflow area is wrong. The correct information is in Table and Index Organization:
ROW_OVERFLOW_DATA Allocation Unit
For every partition used by a table (heap or clustered table), index, or indexed view, there is one ROW_OVERFLOW_DATA allocation unit. This allocation unit contains zero (0) pages until a data row with variable length columns (varchar, nvarchar, varbinary, or sql_variant) in the IN_ROW_DATA allocation unit exceeds the 8 KB row size limit. When the size limitation is reached, SQL Server moves the column with the largest width from that row to a page in the ROW_OVERFLOW_DATA allocation unit. A 24-byte pointer to this off-row data is maintained on the original page.
So is not columns over 4KB, is rows that don't fit in the free space on the page, and is not the 'remaining', is the entire column.