Is it possible to have SQL Server convert collation to UTF-8 / UTF-16

Rookie picture Rookie · May 16, 2015 · Viewed 52.2k times · Source

In a project I am working on my data is stored in SQL Server, with the collation Danish_Norwegian_CI_AS. The data is output'ed through FreeTDS and ODBC, to python that handles the data as UTF-8. Some of the characters, like å, ø and æ, are not being coded correctly, causing the project progress to grind to a halt.

I spent a couple of hours reading about the confusing world of encodings, collation and code-pages, and feel like I have gotten a better understanding of the entire picture.

Some of the articles I have read, makes me think that it would be possible to: Specify in the SQL select statement, that the collation data should be encoded to UTF-8 when it is output'ed.

The reason I am thinking this is possible is this article which shows an example of how to get to tables, with different collations, to play nice together.

Any pointers in the direction of converting collation to UTF-8 / UTF-16, would be greatly appreciated!

EDIT: I have read that SQL Server provides a unicode option through nchar, nvarchar and ntext, and that the other string variables char, varchar and text are coded according to set collation. I have also read that the above mentioned unicode options are coded in utf-16 variant ucs-2 (I hope I am remembering that right). So; in order to allow tables of locale collation and unicode, to play nice, there should be a conversion function, no?

Answer

Rookie picture Rookie · Jul 31, 2015

4 months on, I finally found the answer to my problem. Turns out it had nothing to do with the FreeTDS driver, or the database collation:

It was pyodbc's connect function, which apparently requires a flag; unicode_results=True

Posted here to help other unfortunate soules doomed to wander aimlessly in the dark, looking for a clue.