Remove accents from a dataframe column in R

hans glick picture hans glick · Aug 25, 2016 · Viewed 20.1k times · Source

I got a data.table base. I got a term column in this data.table

class(base$term)
[1] character
length(base$term)
[1] 27486

I'm able to remove accents from a string. I'm able to remove accents from a vector of string.

iconv("Millésime",to="ASCII//TRANSLIT")
[1] "Millesime"
iconv(c("Millésime","boulangère"),to="ASCII//TRANSLIT")
[1] "Millesime" "boulangere"

But for some reason, it does not work when I apply the very same function on my term column

base$terme[2]
[1] "Millésime"
iconv(base$terme[2],to="ASCII//TRANSLIT")
[1] "MillACsime"

Does anybody know what is going on here?

Answer

hans glick picture hans glick · Aug 26, 2016

Ok the way to solve the problem :

Encoding(base$terme[2])
[1] "UTF-8"
iconv(base$terme[2],from="UTF-8",to="ASCII//TRANSLIT")
[1] "Millesime"

Thanks to @nicola