How to convert a String from UTF8 to Latin1 in C/C++?

ashiaka picture ashiaka · Oct 12, 2012 · Viewed 7.5k times · Source

The question I have is quite simple, but I couldn't find a solution so far:

How can I convert a UTF8 encoded string to a latin1 encoded string in C++ without using any extra libs like libiconv?

Every example I could find so far is for latin1 to UTF8 conversion?

Answer

filmor picture filmor · Oct 12, 2012
typedef unsigned value_type;

template <typename Iterator>
size_t get_length (Iterator p)
{
    unsigned char c = static_cast<unsigned char> (*p);
    if (c < 0x80) return 1;
    else if (!(c & 0x20)) return 2;
    else if (!(c & 0x10)) return 3;
    else if (!(c & 0x08)) return 4;
    else if (!(c & 0x04)) return 5;
    else return 6;
}

template <typename Iterator>
value_type get_value (Iterator p)
{
    size_t len = get_length (p);

    if (len == 1)
    return *p;

    value_type res = static_cast<unsigned char> (
                                    *p & (0xff >> (len + 1)))
                                     << ((len - 1) * 6);

    for (--len; len; --len)
        res |= (static_cast<unsigned char> (*(++p)) - 0x80) << ((len - 1) * 6);

    return res;
}

This function will return the unicode code point at p. You can now convert a string using

for (std::string::iterator p = s_utf8.begin(); p != s_utf8.end(); ++p)
{
     value_type value = get_value<std::string::iterator&>(p));
     if (value > 0xff)
         throw "AAAAAH!";
     s_latin1.append(static_cast<char>(value));
}

No guarantees, the code is quite old :)