python base64 string decoding

bleeding edge picture bleeding edge · Aug 3, 2011 · Viewed 22k times · Source

I've got what's supposed to be a UCS-2 encoded xml document that I've managed to build a DOM based on minidom after some tweaking.

The issue is that I'm supposed to have some data encoded on base64. I know for a fact that:

AME= (or \x00A\x00M\x00E\x00=) is base64 code for Á

How would I decode that?

http://www.fileformat.info/info/unicode/char/00c1/index.htm shows that the unicode representation for Á is: u"\u00C1" and in UTF-16: 0x00C1

base64.b64decode('AME=').decode('UTF-16')

shows

u'\uc100'

as the unicode representation for the character, but it looks byte-swapped.

Any idea on how to decode it?

Answer

Ray Toal picture Ray Toal · Aug 3, 2011

Check this out

>>> import base64
>>> base64.b64decode('AME=').decode('UTF-16')
u'\uc100'
>>> base64.b64decode('AME=').decode('UTF-16LE')  
u'\uc100'
>>> base64.b64decode('AME=').decode('UTF-16BE')
u'\xc1'

Perhaps you are looking for big endian decoding?