Remove all characters from a string who's ordinals are out of range

Chris Dutrow picture Chris Dutrow · Jun 7, 2012 · Viewed 8.8k times · Source

What is a good way to remove all characters that are out of the range: ordinal(128) from a string in python?

I'm using hashlib.sha256 in python 2.7. I'm getting the exception:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u200e' in position 13: ordinal not in range(128)

I assume this means that some funky character found its way into the string that I am trying to hash.

Thanks!

Answer

Joran Beasley picture Joran Beasley · Jun 7, 2012
new_safe_str = some_string.encode('ascii','ignore') 

I think would work

or you could do a list comprehension

"".join([ch for ch in orig_string if ord(ch)<= 128])

[edit] however as others have said it may be better to figure out how to deal with unicode in general... unless you really need it encoded as ascii for some reason