Most efficient way of exporting large (3.9 mill obs) data.frames to text file?

jans picture jans · Mar 14, 2012 · Viewed 19.1k times · Source

I have a fairly large dataframe in R that I would like to export to SPSS. This file has caused me hours of headaches trying to import it to R in the first place, however I got successful using read.fwf() using the options comment.char="%" (a character not appearing in the file) and fill= TRUE(it was a fixed-width ASCII file with some rows lacking all variables, causing error messages).

Anyway, my data frame currently consists of 3,9 mill observations and 48 variables (all character). I can write it to file fairly quickly by splitting it into 4 x 1 mill obs sets with df2 <- df[1:1000000,] followed by write.table(df2) etc., but can't write the entire file in one sweep without the computer locking up and needing a hard reset to come back up.

After hearing anecdotal stories about how R is unsuited for large datasets for years, this is the first time I have actually encountered a problem of this kind. I wonder whether there are other approaches(low-level "dumping" the file directly to disk?) or whether there is some package unknown to me that can handle export of large files of this type efficiently?

Answer

tim riffe picture tim riffe · Mar 15, 2012

1) If your file is all character strings, then it saves using write.table() much faster if you first change it to a matrix.

2) also write it out in chunks of, say 1000000 rows, but always to the same file, and using the argument append = TRUE.