I know about Data URIs in which base64
encoded data can be used inline such as images. Today I received an email actually an spam one in which there was an animated (gif) icon in its subject:
Here is the icon alone:
So the only thing did cross my mind was all about Data URIs and if Gmail allows some sort of emoticons to be inserted in subject. I saw the full detailed version of email and pointed to subject line at the below picture:
So GIF comes from =?UTF-8?B?876Urg==?=
encoded string which is similar to Data URI scheme however I couldn't get the icon out of it. Here is element HTML source:
Long story short, there are lots of emoticons from https://mail.google.com/mail/e/XXX
where XXX
are hexadecimal numbers. They are documented nowhere or I couldn't find it. If that's about Data URI, so how is it possible to include them in Gmail's email subject? (I forwarded that email to a yahoo email account, seeing [?]
instead of icon) and if it's not, then how that encoded string is parsed?
They are referred to internally as goomoji
, and they appear to be a non-standard UTF-8 extension. When Gmail encounters one of these characters, it is replaced by the corresponding icon. I wasn't able to find any documentation on them, but I was able to reverse engineer the format.
Those icons are actually the icons that appear under the "Insert emoticons" panel.
While I don't see the 52E
icon in the list, there are several others that follow the same convention.
Note that there are also some icons whose names are prefixed, such as gtalk.03C
. I was not able to determine if or how these icons can be used in this manner.
It's not actually a Data URI, though it does share some similarities. It's actually a special syntax for encoding non-ASCII characters in email subjects, defined in RFC 2047. Basically, it works like this.
=?charset?encoding?data?=
So, in our example string, we have the following data.
=?UTF-8?B?876Urg==?=
charset
= UTF-8
encoding
= B
(means base64)data
= 876Urg==
We know that somehow, 876Urg==
means the icon 52E
, but how?
If we base64 decode 876Urg==
, we get 0xf3be94ae
. This looks like the following in binary:
11110011 10111110 10010100 10101110
These bits are consistent with a 4-byte UTF-8 encoded character.
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
So the relevant bits are the following.:
011 111110 010100 101110
Or when aligned:
00001111 11100101 00101110
In hexadecimal, these bytes are the following:
FE52E
As you can see, except for the FE
prefix which is presumably to distinguished the goomoji
icons from other UTF-8 characters, it matches the 52E
in the icon URL. Some testing proves that this holds true for other icons.
This can of course be scripted. I created the following Python code for my testing. These functions can convert the base64 encoded string to and from the short hex string found in the URL. Note, this code is written for Python 3, and is not Python 2 compatible.
import base64
def goomoji_decode(code):
#Base64 decode.
binary = base64.b64decode(code)
#UTF-8 decode.
decoded = binary.decode('utf8')
#Get the UTF-8 value.
value = ord(decoded)
#Hex encode, trim the 'FE' prefix, and uppercase.
return format(value, 'x')[2:].upper()
def goomoji_encode(code):
#Add the 'FE' prefix and decode.
value = int('FE' + code, 16)
#Convert to UTF-8 character.
encoded = chr(value)
#Encode UTF-8 to binary.
binary = bytearray(encoded, 'utf8')
#Base64 encode return end return a UTF-8 string.
return base64.b64encode(binary).decode('utf-8')
print(goomoji_decode('876Urg=='))
print(goomoji_encode('52E'))
52E
876Urg==
And, of course, finding an icon's URL simply requires creating a new draft in Gmail, inserting the icon you want, and using your browser's DOM inspector.