Does it make any sense to store UTF-16 encoded text using wchar_t*
on Linux? The obvious problem is that wchar_t
is four bytes on Linux and UTF-16 takes usually two (or sometimes two groups of two) bytes per character.
I'm trying to use a third-party library that does exactly that and it seems very confusing. Looks like things are messed up because on Windows wchar_t
is two bytes, but I just want to double check since it's a pretty expensive commercial library and may be I just don't understand something.
While it's possible to store UTF-16 in wchar_t
, such wchar_t
values (or arrays of them used as strings) are not suitable for use with any of the standard functions which take wchar_t
or pointers to wchar_t
strings. As such, to answer your initial question of "Does it make sense...?", I would reply with a definitive no. You could use uint16_t
for this purpose of course, or the C11 char16_t
if it's available, though I fail to see any reason why the latter would be preferable unless you're also going to use the C11 functions for processing it (and they don't seem to be implemented yet).