how can I convert wstring to u16string?

D.A.KANG picture D.A.KANG · Mar 11, 2017 · Viewed 8.2k times · Source

I want to convert wstring to u16string in C++.

I can convert wstring to string, or reverse. But I don't know how convert to u16string.

u16string CTextConverter::convertWstring2U16(wstring str)

{

        int iSize;
        u16string szDest[256] = {};
        memset(szDest, 0, 256);
        iSize = WideCharToMultiByte(CP_UTF8, NULL, str.c_str(), -1, NULL, 0,0,0);

        WideCharToMultiByte(CP_UTF8, NULL, str.c_str(), -1, szDest, iSize,0,0);
        u16string s16 = szDest;
        return s16;
}

Error in WideCharToMultiByte(CP_UTF8, NULL, str.c_str(), -1, szDest, iSize,0,0);'s szDest. Cause of u16string can't use with LPSTR.

How can I fix this code?

Answer

zett42 picture zett42 · Mar 11, 2017

For a platform-independent solution see this answer.

If you need a solution only for the Windows platform, the following code will be sufficient:

std::wstring wstr( L"foo" );
std::u16string u16str( wstr.begin(), wstr.end() );

On the Windows platform, a std::wstring is interchangeable with std::u16string because sizeof(wstring::value_type) == sizeof(u16string::value_type) and both are UTF-16 (little endian) encoded.

wstring::value_type = wchar_t
u16string::value_type = char16_t

The only difference being that wchar_t is signed, whereas char16_t is unsigned so you only have to do sign conversion, which can be performed using the u16string constructor that takes an iterator pair as arguments. This constructor will implicitly convert wchar_t to char16_t.

Full example console application:

#include <windows.h>
#include <string>

int main()
{
    static_assert( sizeof(std::wstring::value_type) == sizeof(std::u16string::value_type),
        "std::wstring and std::u16string are expected to have the same character size" );
   
    std::wstring wstr( L"foo" );
    std::u16string u16str( wstr.begin(), wstr.end() );
   
    // The u16string constructor performs an implicit conversion like:
    wchar_t wch = L'A';
    char16_t ch16 = wch;
   
    // Need to reinterpret_cast because char16_t const* is not implicitly convertible
    // to LPCWSTR (aka wchar_t const*).
    ::MessageBoxW( 0, reinterpret_cast<LPCWSTR>( u16str.c_str() ), L"test", 0 );
   
    return 0;
}