Looking for simple practical C++ examples of how to use ICU

user754425 picture user754425 · May 15, 2011 · Viewed 15.2k times · Source

I am looking for simple practical C++ examples on how to use ICU.
The ICU home page is not helpful in this regard.
I am not interested on what and why Unicode.
The few demos are not self contained and not compilable examples ( where are the includes? )
I am looking for something like 'Hello, World' of:
How to open and read a file encoded in UTF-8
How to use STL / Boost string functions to manipulate UTF-8 encoded strings etc.

Answer

Ferruccio picture Ferruccio · May 16, 2011

There's no special way to read a UTF-8 file unless you need to process a byte order mark (BOM). Because of the way UTF-8 encoding works, functions that read ANSI strings can also read UTF-8 strings.

The following code will read the contents of a file (ANSI or UTF-8) and do a couple of conversions.

#include <fstream>
#include <string>

#include <unicode/unistr.h>

int main(int argc, char** argv) {
    std::ifstream f("...");
    std::string s;
    while (std::getline(f, s)) {
        // at this point s contains a line of text
        // which may be ANSI or UTF-8 encoded

        // convert std::string to ICU's UnicodeString
        UnicodeString ucs = UnicodeString::fromUTF8(StringPiece(s.c_str()));

        // convert UnicodeString to std::wstring
        std::wstring ws;
        for (int i = 0; i < ucs.length(); ++i)
            ws += static_cast<wchar_t>(ucs[i]);
    }
}

Take a look at the online API reference.

If you want to use ICU through Boost, see Boost.Locale.