Python 3: reading UCS-2 (BE) file

elder elder picture elder elder · Jan 23, 2013 · Viewed 21.3k times · Source

I can't seem to be able to decode UCS-2 BE files (legacy stuff) under Python 3.3, using the built-in open() function (stack trace shows UnicodeDecodeError and contains my readLine() method) - in fact, I wasn't able to find a flag for specifying this encoding.

Using Windows 8, terminal is set to codepage 65001, using 'Lucida Console' fonts.

Code snippet won't be of too much help, I guess:

def display_resource():
    f = open(r'D:\workspace\resources\JP.res', encoding=<??tried_several??>)
    while True:
        line = f.readline()
        if len(line) == 0:
            break

Appreciating any insight into this issue.

Answer

Martijn Pieters picture Martijn Pieters · Jan 23, 2013

UCS-2 is UTF-16, really, for any codepoint that was assigned when it was still called UCS-2 in any case.

Open it with encoding='utf16'. If there is no BOM (the Byte order mark, 2 bytes at the start, for BE that'd be \xfe\xff), then use encoding='utf_16_be' to force a byte order.