special characters (emoticons) in text file

Sean Connolly picture Sean Connolly · Sep 30, 2013 · Viewed 37.5k times · Source

I have a txt file of an conversation exported from WhatsApp. WhatsApp supports emoticons in their conversation, and the exported conversation also, to my surprise, contains these emoticons! That is, if I open the text file in a text editor (Text Wrangler on Mac 10.8) I can see the emoticons. The text file is encoded in UTF-8 and there are no resources associated with the file that I can tell.

Can anyone explain to me how these emoticons are being included in the text file and how they are accurately being interpreted by the Text Editor? Is this related to the character encoding at all? Are extra resources included with the text file?

Answer

deceze picture deceze · Sep 30, 2013

Unicode contains sections which specify emoji as "characters". They're regular characters, you only need a font which can display them. Also see the Unicode Emoji FAQ.

In a text file, characters are basically encoded as numbers in the form of bytes. To display those visually on a computer screen you need a font which contains the visual glyph to render this character. Since the process is always numeric identifier → font → visible glyph, it should be pretty obvious that a "character" can be anything visual, including emoji or any other image.

character viewer