R write.csv with UTF-16 encoding

Daniel Dickison picture Daniel Dickison · Mar 11, 2011 · Viewed 8.1k times · Source

I'm having trouble outputting a data.frame using write.csv using UTF-16 character encoding.

Background: I am trying to write out a CSV file from a data.frame for use in Excel. Excel Mac 2011 seems to dislike UTF-8 (if I specify UTF-8 during text import, non-ASCII characters show up as underscores). I've been led to believe that Excel will be happy with UTF-16LE encoding.

Here's the example data.frame:

> foo
  a  b
1 á 羽
> Encoding(levels(foo$a))
[1] "UTF-8"
> Encoding(levels(foo$b))
[1] "UTF-8"

So I tried to output the data.frame by doing:

f <- file("foo.csv", encoding="UTF-16LE")
write.csv(foo, f)

This gives me an ASCII file that looks like:

"","

If I use encoding="UTF-16", I get a file that only contains the byte-order mark 0xFE 0xFF.

If I use encoding="UTF-16BE", I get an empty file.

This is on a 64-bit version of R 2.12.2 on Mac OS X 10.6.6. What am I doing wrong?

Answer

daroczig picture daroczig · Mar 11, 2011

You could simply save the csv in UTF-8 and later convert it to UTF-16LE with iconv in terminal.

If you insist on doing it in R, the following might work - althought it seems that iconv in R does have some issues, see: http://tolstoy.newcastle.edu.au/R/e10/devel/10/06/0648.html

> x <- c("foo", "bar")
> iconv(x,"UTF-8","UTF-16LE")
Error in iconv(x, "UTF-8", "UTF-16LE") : 
  embedded nul in string: 'f\0o\0o\0'

As you can see the above linked patch is really needed - which I did not tested, but if you want to keep it simly (and nasty): just call the third party iconv program inside R with a system call after saving the table to csv.