python 2.7 character \u2013

user2904150 picture user2904150 · Dec 2, 2013 · Viewed 27.2k times · Source

I have following code:

# -*- coding: utf-8 -*-

print u"William Burges (1827–81) was an English architect and designer."

When I try to run it from cmd. I get following message:

Traceback (most recent call last):
  File "C:\Python27\utf8.py", line 3, in <module>
    print u"William Burges (1827ŌĆō81) was an English architect and designer."
  File "C:\Python27\lib\encodings\cp775.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
 20: character maps to <undefined>

How could I solve this problem and make Python read this \u2013 character? And why Python doesn't read it with existing code, I thought that utf-8 works for every character.

Thank you

EDIT:

This code prints out wanted outcome:

# -*- coding: utf-8 -*-

print unicode("William Burges (1827-81) was an English architect and designer.", "utf-8").encode("cp866")

But when I try to print more than one sentence, for example:

# -*- coding: utf-8 -*-

print unicode("William Burges (1827–81) was an English architect and designer. I am here. ", "utf-8").encode("cp866")

I get same error message:

Traceback (most recent call last):
  File "C:\Python27\utf8vs.py", line 3, in <module>
    print unicode("William Burges (1827ŌĆō81) was an English architect and desig
ner. I am here. ", "utf-8").encode("cp866")
  File "C:\Python27\lib\encodings\cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
 20: character maps to <undefined>

Answer

Jack Aidley picture Jack Aidley · Dec 2, 2013

I suspect the problem is down to the print statement rather than anything inherent to the python (it works fine on my Mac). In order to print the string, it needs to convert it into a displayable format; the longer dash you've used isn't displayable in the default character set on the Windows command line.

The difference between your two sentences is not in the length but in the kind of dash used in "(1827-81)" vs "(1827–81)" - can you see the subtle difference? Try copying and pasting one over the other to check this.

See also Python, Unicode, and the Windows console.