Writing Unicode to a file in C++

Garrett Ratliff picture Garrett Ratliff · Apr 9, 2013 · Viewed 17.2k times · Source

I have a problem with writing unicode to a file in C++. I want to write to a file with my own extension a few smiley faces that you can get by typing ALT+NUMPAD(2). I can display it on CMD by making a char and assigning the value of '\2' to it and it will display a smiley face, but it won't write it to a file.

Here is a snippet of code for my program:

ofstream myfile;
myfile.open("C:\Users\My Username\test.exampleCodeFile");
myfile << "\2";
myfile.close();

It will write to the file, but it wont display what I want. I would show you what it displays but StackOverflow won't let me display the character. Thanks in advance.

Answer

Mark Tolonen picture Mark Tolonen · Apr 10, 2013

You have to use Unicode to specify the characters you want to display. The character represented by byte 02h in the console is translated by code page 437 (cp437) to the Unicode character U+263B. Using a source file saved in UTF-8 with BOM makes using Unicode easier, because you can paste or type the characters you want without resorting to Unicode escape codes.

For a file stream the stream needs to be configured for UTF-8. There are various ways to do this and it depends on the compiler, but using Visual Studio 2012, source saved in UTF-8 w/ BOM, and a bit of Googling:

#include <locale>
#include <codecvt>
#include <fstream>
#include <iostream>
#include <io.h>
#include <fcntl.h>
using namespace std;

int main()
{
    const std::locale utf8_locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());
    wofstream f(L"sample.txt");
    f.imbue(utf8_locale);
    f << L"\u263b我是美国人。我叫马克。" << endl;

    _setmode(_fileno(stdout),_O_U16TEXT);
    wcout << L"\u263b我是美国人。我叫马克。" << endl;
}

Content of sample.txt as viewed in Notepad:

☻我是美国人。我叫马克。

Hex dump (correct UTF-8):

E68891E698AFE7BE8EE59BBDE4BABAE38082E68891E58FABE9A9ACE5858BE380820D0A

Output to console cut-and-pasted here. The visual display was � for each Chinese character without the right font, but the characters display correctly pasted into SO or Notepad.

☻我是美国人。我叫马克。