C++ Convert string (or char*) to wstring (or wchar_t*)

Samir picture Samir · Apr 4, 2010 · Viewed 286.9k times · Source
string s = "おはよう";
wstring ws = FUNCTION(s, ws);

How would i assign the contents of s to ws?

Searched google and used some techniques but they can't assign the exact content. The content is distorted.

Answer

Johann Gerell picture Johann Gerell · Sep 3, 2013

Assuming that the input string in your example (おはよう) is a UTF-8 encoded (which it isn't, by the looks of it, but let's assume it is for the sake of this explanation :-)) representation of a Unicode string of your interest, then your problem can be fully solved with the standard library (C++11 and newer) alone.

The TL;DR version:

#include <locale>
#include <codecvt>
#include <string>

std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::string narrow = converter.to_bytes(wide_utf16_source_string);
std::wstring wide = converter.from_bytes(narrow_utf8_source_string);

Longer online compilable and runnable example:

(They all show the same example. There are just many for redundancy...)

Note (old):

As pointed out in the comments and explained in https://stackoverflow.com/a/17106065/6345 there are cases when using the standard library to convert between UTF-8 and UTF-16 might give unexpected differences in the results on different platforms. For a better conversion, consider std::codecvt_utf8 as described on http://en.cppreference.com/w/cpp/locale/codecvt_utf8

Note (new):

Since the codecvt header is deprecated in C++17, some worry about the solution presented in this answer were raised. However, the C++ standards committee added an important statement in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html saying

this library component should be retired to Annex D, along side , until a suitable replacement is standardized.

So in the foreseeable future, the codecvt solution in this answer is safe and portable.