Adam Sh
·
Apr 27, 2012
·
Viewed 76.2k times
·
Source
I am working on decoding text. I am trying to find the character code for the — character, not to be mistaken for -, in ASCII. I have tried unsuccessfully. Does anybody know how to convert it?
When an actual em dash is unavailable—as in the ASCII character set—a double ("--") or triple hyphen-minus ("---") is used. In Unicode, the em dash is U+2014 (decimal 8212).
Em dash character is not a part of ASCII character set.
My code just scrapes a web page, then converts it to Unicode.
html = urllib.urlopen(link).read()
html.encode("utf8","ignore")
self.response.out.write(html)
But I get a UnicodeDecodeError:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/…
So this web page is rendering with these symbols and they are found throughout this website/application but on no other sites. Can anyone tell me
What this symbol is ?
Why it is showing up only in one browser ?