Using iconv to convert strings to ISO-8859-1 in C/C++

maixl picture maixl · Nov 23, 2011 · Viewed 8.1k times · Source

I want to convert strings from the GBK character set to ISO-8859-1.

I have tried to use the iconv library, but iconv() always returns -1, and errno decodes to "Invalid or incomplete multibyte or wide character".

How can I achieve this?

Answer

caf picture caf · Nov 23, 2011

If you have opened the conversion descriptor without //TRANSLIT or //IGNORE, then iconv() will return an error when the input character cannot be represented in the target character set. Since ISO-8859-1 cannot represent most GBK characters, this is likely what is happening. The following example works for me:

#include <stdio.h>
#include <string.h>
#include <iconv.h>

int main()
{
    char *gbk_str = "GBK \xB5\xE7\xCA\xD3\xBB\xFA";
    char dest_str[100];
    char *out = dest_str;
    size_t inbytes = strlen(gbk_str);
    size_t outbytes = sizeof dest_str;
    iconv_t conv = iconv_open("ISO-8859-1//TRANSLIT", "GBK");

    if (conv == (iconv_t)-1) {
        perror("iconv_open");
        return 1;
    }

    if (iconv(conv, &gbk_str, &inbytes, &out, &outbytes) == (size_t)-1) {
        perror("iconv");
        return 1;
    }

    dest_str[sizeof dest_str - outbytes] = 0;
    puts(dest_str);

    return 0;
}

(I hope that GBK string isn't obscene, I have no idea what it means!)