How can I read a Unicode (UTF-8) file into wstring
(s) on the Windows platform?
With C++11 support, you can use std::codecvt_utf8 facet which encapsulates conversion between a UTF-8 encoded byte string and UCS2 or UCS4 character string and which can be used to read and write UTF-8 files, both text and binary.
In order to use facet you usually create locale object that encapsulates culture-specific information as a set of facets that collectively define a specific localized environment. Once you have a locale object, you can imbue your stream buffer with it:
#include <sstream>
#include <fstream>
#include <codecvt>
std::wstring readFile(const char* filename)
{
std::wifstream wif(filename);
wif.imbue(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));
std::wstringstream wss;
wss << wif.rdbuf();
return wss.str();
}
which can be used like this:
std::wstring wstr = readFile("a.txt");
Alternatively you can set the global C++ locale before you work with string streams which causes all future calls to the std::locale
default constructor to return a copy of the global C++ locale (you don't need to explicitly imbue stream buffers with it then):
std::locale::global(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));