C++11 introduces a new set of string literal prefixes (and even allows user-defined suffixes). On top of this, you can directly use Unicode escape sequences to code a certain symbol without having to worry about encoding.
const char16_t* s16 = u"\u00DA";
const char32_t* s32 = U"\u00DA";
But can I use the unicode escape sequences in wchar_t
string literals as well? It would seem to be a defect if this wasn't possible.
const wchar_t* sw = L"\u00DA";
The integer value of sw[0]
would of course depend on what wchar_t
is on a particular platform, but to all other effects, this should be portable, no?
It would work, but it may not have the desired semantics. \u00DA
will expand into as many target characters as necessary for UTF8/16/32 encoding, depending on the size of wchar_t
, but bear in mind that wide strings do not have any documented, guaranteed encoding semantics -- they're simply "the system's encoding", with no attempt made to say what that is, or require the user to know what that is.
So it's best not to mix and match. Use either one, but not both, of the two:
system-specific: char*
/""
, wchar_t*
/L""
, \x
-literals, mbstowcs
/wcstombs
Unicode: char*
/u8""
, char16_t*
/u""
, char32_t*
/U""
, \u
/\U
literals.