I have a database that is going to keep log entries.
One of the columns in the log table contains serialized (to XML) objects and a guy on my team proposed to go with XML data type rather than NVARCHAR(MAX). This table will have logs kept "forever" (archiving some very old entries may be considered in the future).
I'm a little worried about the CPU overhead, but I'm even more worried that DB can grow faster (FoxyBOA from the referenced question got 70% bigger DB when using XML).
I have read this question and it gave me some ideas but I am particularly interested in clarification on whether the database size increases or decreases.
Can you please share your insight/experiences in that matter.
BTW. I don't currently have any need to depend on XML features within SQL Server (there's nearly zero advantage to me in the specific case). Ocasionally log entries will be extracted, but I prefer to handle the XML using .NET (either by writing a small client or using a function defined in a .NET assembly).
If you do have XML, and you can be sure it's always XML, I would definitely recommend going that way. SQL Server stores XML in an optimized format - you don't even need any XML indices on that to benefit from it.
If you insert 5000 rows of a 5KB XML into an XML column, you get roughly 1250 pages = 9 MB. Inserting the same 5000 rows with the same 5KB XML into NVARCHAR(MAX) uses over 3700 pages or 29 MB - quite a difference!
And that difference should be even more pronounced if you can associate your XML with a XML schema stored in SQL Server. Plus you're also guaranteed that the XML stored conforms to a schema - can be very helpful at times! Can't do that with a plain NVARCHAR(MAX) column...
And I don't agree that using XML over NVARCHAR(MAX) has any performance penalty - quite the contrary. Since you're potentially retrieving less data from SQL Server when you're about to display or fetch the content, I would argue it's even a tad faster than NVARCHAR(MAX).