Storing UTF-16/Unicode data in SQL Server

sql-server unicode utf-16 ucs2

David Cameron · Apr 30, 2009 · Viewed 14.9k times · Source

According to this, SQL Server 2K5 uses UCS-2 internally. It can store UTF-16 data in UCS-2 (with appropriate data types, nchar etc), however if there is a supplementary character this is stored as 2 UCS-2 characters.

This brings the obvious issues with the string functions, namely that what is one character is treated as 2 by SQL Server.

I am somewhat surprised that SQL Server is basically only able to handle UCS-2, and even more so that this is not fixed in SQL 2K8. I do appreciate that some of these characters may not be all that common.

Aside from the functions suggested in the article, any suggestions on best approach for dealing with the (broken) string functions and UTF-16 data in SQL Server 2K5.

Answer

SQL Server 2012 now supports UTF-16 including surrogate pairs. See http://msdn.microsoft.com/en-us/library/ms143726(v=sql.110).aspx, especially the section "Supplementary characters".

So one fix for the original problem is to adopt SQL Server 2012.

Storing UTF-16/Unicode data in SQL Server

Answer

Related questions