Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

c++ unicode utf-8 utf-16 wstring

Dave · Mar 27, 2010 · Viewed 18.2k times · Source

I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that.

I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite understand how to apply all of that information to my problem.

The program I'm working on displays data in a Windows GUI. That data is persisted as XML. We often transform that XML using XSLT into HTML or XSL:FO for reporting purposes.

My feeling based on what I have read is that the HTML should be encoded as UTF-8. I know very little about GUI development, but the little bit I have read indicates that the GUI stuff is all based on UTF-16 encoded strings.

I'm trying to understand where this leaves me. Say we decide that all of our persisted data should be UTF-8 encoded XML. Does this mean that in order to display persisted data in a UI component, I should really be performing some sort of explicit UTF-8 to UTF-16 transcoding process?

I suspect my explanation could use clarification, so I'll try to provide that if you have any questions.

Answer

Windows from NT4 onwards is based on Unicode encoded strings, yes. Early versions were based on UCS-2, which is the predecessor of UTF-16, and thus does not support all of the characters that UTF-16 does. Later versions are based on UTF-16. Not all OSes are based on UTF-16/UCS-2, though. *nix systems, for instance, are based on UTF-8 instead.

UTF-8 is a very good choice for storing data persistently. It is a universally supported encoding in all Unicode environments, and it is a good balance between data size and loss-less data compatibility.

Yes, you would have to parse the XML, extract the necessary information from it, and decode and transform it into something the UI can use.

Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

Answer

Related questions