How to print non-ASCII characters in Python

Roman picture Roman · Nov 10, 2009 · Viewed 24k times · Source

I have a problem when I'm printing (or writing to a file) the non-ASCII characters in Python. I've resolved it by overriding the str method in my own objects, and making "x.encode('utf-8')" inside it, where x is a property inside the object.

But, if I receive a third-party object, and I make "str(object)", and this object has a non-ASCII character inside, it will fail.

So the question is: is there any way to tell the str method that the object has an UTF-8 codification, generically? I'm working with Python 2.5.4.

Answer

Aaron Digulla picture Aaron Digulla · Nov 10, 2009

There is no way to make str() work with Unicode in Python < 3.0.

Use repr(obj) instead of str(obj). repr() will convert the result to ASCII, properly escaping everything that isn't in the ASCII code range.

Other than that, use a file object which allows unicode. So don't encode at the input side but at the output side:

fileObj = codecs.open( "someFile", "w", "utf-8" )

Now you can write unicode strings to fileObj and they will be converted as needed. To make the same happen with print, you need to wrap sys.stdout:

import sys, codecs, locale
print str(sys.stdout.encoding)
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
line = u"\u0411\n"
print type(line), len(line)
sys.stdout.write(line)
print line