iconv: Convert from CP1252 to UTF-8

Somebody picture Somebody · Mar 15, 2013 · Viewed 29.9k times · Source

I'm trying to convert the CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8. I have tried this command:

iconv -c -f=WINDOWS-1252 -t=UTF-8 test.txt

No luck, getting some weird results:

ÊÀÇÀÃÃœ ÃÎÂÛÉ ÂÅÊ

I tried entering the same string (Çàïèñêè ýêñïåäèòîðà) here, and they are able to convert it without problems: http://www.artlebedev.ru/tools/decoder/

What is going wrong?

Answer

xmojmr picture xmojmr · Sep 16, 2014

When you convert CP1252 encoded string Çàïèñêè ýêñïåäèòîðà to UTF-8 with command iconv.exe -f CP1252 -t UTF-8 test.txt >testout.txt then the source file test.txt (Hex view:

enter image description here

) will be converted into target file testout.txt (Hex view:

enter image description here

) which is UTF-8 code for Çàïèñêè ýêñïåäèòîðà.

Same garbage you put in will come the other end out. iconv's behavior is correct and as expected.

What you are perplexed by is that you don't see what you expect and that is because your input 8bit string is actually encoded in Windows-1251 (Cyrillic) Codepage.

→ So the correct code page is not CP1252 but CP1251

enter image description here

Command iconv.exe -f CP1251 -t UTF-8 test.txt >testout2.txt converts the source file test.txt into target file testout2.txt (Hex view:

enter image description here

) which is UTF-8 code for Записки экспедитора which is what your user's expect to see