For some reason, I cannot get this simple statement to work on the ñ
. It seems to work on anything else but doesn't like that character. Any ideas?
DF['NAME']=DF['NAME'].str.replace("ñ","n")
Thanks
I'm assuming you're using Python 2.x here and this is likely a Unicode problem. Don't worry, you're not alone--unicode is really tough in general and especially in Python 2, which is why it's been made standard in Python 3.
If all you're concerned about is the ñ
, you should decode in UTF-8, and then just replace the one character.
That would look something like the following:
DF['name'] = DF['name'].str.decode('utf-8').replace(u'\xf1', 'n')
As an example:
>>> "sureño".decode("utf-8").replace(u"\xf1", "n")
u'sureno'
If your string is already Unicode, then you can (and actually have to) skip the decode
step:
>>> u"sureño".replace(u"\xf1", "n")
u'sureno'
Note here that u'\xf1'
uses the hex escape for the character in question.
I was informed in the comments that <>.str.replace
is a pandas series method, which I hadn't realized. The answer to this possibly might be something like the following:
DF['name'] = map(lambda x: x.decode('utf-8').replace(u'\xf1', 'n'), DF['name'].str)
or something along those lines, if that pandas object is iterable.
It actually just occurred to me that your issue may be as simple as the following:
DF['NAME']=DF['NAME'].str.replace(u"ñ","n")
Note how I've added the u
in front of the string to make it unicode.