wchar_t and encoding

Hunter picture Hunter · May 3, 2012 · Viewed 7.1k times · Source

If I want to convert a piece of string to UTF-16, say char * xmlbuffer, do I have to convert the type to wchar_t * before encoding to UTF-16? And is char* type reqired before encoding to UTF-8?

How is wchar_t, char related to UTF-8 or UTF-16 or UTF-32 or other transformation format?

Thanks in advance for help!

Answer

Jon picture Jon · May 4, 2012

No, you don't have to change data types.

About wchar_t: the standard says that

Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales.

Unfortunately, it does not say what encoding wchar_t is supposed to have; this is implementation-dependent. So for example given

auto s = L"foo";

you can make absolutely no assumption about what the value of the expression *s is.

However, you can use an std::string as an opaque sequence of bytes that represent text in any transformation format of your choice without issue. Just don't perform standard library string-related operations on it.