Importing a big xlsx file into R?

user2722443 picture user2722443 · Oct 3, 2013 · Viewed 98.4k times · Source

I'm wondering if anyone knows of a way to import data from a "big" xlsx file (~20Mb). I tried to use xlsx and XLConnect libraries. Unfortunately, both use rJava and I always obtain the same error:

> library(XLConnect)
> wb <- loadWorkbook("MyBigFile.xlsx")
Error: OutOfMemoryError (Java): Java heap space

or

> library(xlsx)
> mydata <- read.xlsx2(file="MyBigFile.xlsx")
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
   java.lang.OutOfMemoryError: Java heap space

I also tried to modify the java.parameters before loading rJava:

> options( java.parameters = "-Xmx2500m")
> library(xlsx) # load rJava
> mydata <- read.xlsx2(file="MyBigFile.xlsx")
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
   java.lang.OutOfMemoryError: Java heap space

or after loading rJava (this is a bit stupid, I think):

> library(xlsx) # load rJava
> options( java.parameters = "-Xmx2500m")
> mydata <- read.xlsx2(file="MyBigFile.xlsx")
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
   java.lang.OutOfMemoryError: Java heap space

But nothing works. Does anyone have an idea?

Answer

orville jackson picture orville jackson · Aug 14, 2014

I stumbled on this question when someone sent me (yet another) Excel file to analyze. This one isn't even that big but for whatever reason I was running into a similar error:

java.lang.OutOfMemoryError: GC overhead limit exceeded

Based on @Dirk Eddelbuettel's comment in a previous answer I installed the openxlsx package (http://cran.r-project.org/web/packages/openxlsx/). and then ran:

library("openxlsx")
mydf <- read.xlsx("BigExcelFile.xlsx", sheet = 1, startRow = 2, colNames = TRUE)

It was just what I was looking for. Easy to use and wicked fast. It's my new BFF. Thanks for the tip @Dirk E!

BTW, I don't want to poach this answer from Dirk E, so if he posts an answer, please accept it rather than mine!