I have a problem when I'm printing (or writing to a file) the non-ASCII characters in Python. I've resolved it by overriding the str
method in my own objects, and making "x.encode('utf-8')" inside it, where x is a property inside the object.
But, if I receive a third-party object, and I make "str(object)", and this object has a non-ASCII character inside, it will fail.
So the question is: is there any way to tell the str
method that the object has an UTF-8 codification, generically? I'm working with Python 2.5.4.
There is no way to make str()
work with Unicode in Python < 3.0.
Use repr(obj)
instead of str(obj)
. repr()
will convert the result to ASCII, properly escaping everything that isn't in the ASCII code range.
Other than that, use a file object which allows unicode. So don't encode at the input side but at the output side:
fileObj = codecs.open( "someFile", "w", "utf-8" )
Now you can write unicode strings to fileObj
and they will be converted as needed. To make the same happen with print
, you need to wrap sys.stdout
:
import sys, codecs, locale
print str(sys.stdout.encoding)
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
line = u"\u0411\n"
print type(line), len(line)
sys.stdout.write(line)
print line