I am reading a file through RJDBC from a MySQL database and it correctly displays all letters in R (e.g., נווה שאנן).
However, even when exporting it using write.csv and fileEncoding="UTF-8" the output looks like
<U+0436>.<U+043A>. <U+041B><U+043E><U+0437><U+0435><U+043D><U+0435><U+0446>
(in this case this is not the string above but a Bulgarian one) for Bulgarian, Hebrew, Chinese and so on. Other special characters like ã,ç etc work fine.
I suspect this is because of UTF-8 BOM but I did not find a solution on the net
My OS is a German Windows7.
edit: I tried
con<-file("file.csv",encoding="UTF-8")
write.csv(x,con,row.names=FALSE)
and the (afaik) equivalent write.csv(x, file="file.csv",fileEncoding="UTF-8",row.names=FALSE)
.
The accepted answer did not help me in a similar application (R 3.1 in Windows, while I was trying to open the file in Excel). Anyway, based on this part of file documentation:
If a BOM is required (it is not recommended) when writing it should be written explicitly, e.g. by writeChar("\ufeff", con, eos = NULL) or writeBin(as.raw(c(0xef, 0xbb, 0xbf)), binary_con)
I came up with the following workaround:
write.csv.utf8.BOM <- function(df, filename)
{
con <- file(filename, "w")
tryCatch({
for (i in 1:ncol(df))
df[,i] = iconv(df[,i], to = "UTF-8")
writeChar(iconv("\ufeff", to = "UTF-8"), con, eos = NULL)
write.csv(df, file = con)
},finally = {close(con)})
}
Note that df is the data.frame and filename is the path to the csv file.