I have been reading up on how to read and combine multiple xlsx files into one R data frame and have come across some very good suggestions like, How to read multiple xlsx file in R using loop with specific rows and columns, but non fits my data set so far.
I would like R to read in multiple xlsx files with that have multiple sheets. All sheets and files have the same columns but not the same length and NA's should be excluded. I want to skip the first 3 rows and only take in columns 1:6, 8:10, 12:17, 19.
So far I tried:
file.list <- list.files(recursive=T,pattern='*.xlsx')
dat = lapply(file.list, function(i){
x = read.xlsx(i, sheetIndex=1, sheetName=NULL, startRow=4,
endRow=NULL, as.data.frame=TRUE, header=F)
# Column select
x = x[, c(1:6,8:10,12:17,19)]
# Create column with file name
x$file = i
# Return data
x
})
dat = do.call("rbind.data.frame", dat)
But this only takes all the first sheet of every file
Does anyone know how to get all the sheets and files together in one R data frame?
Also, what packages would you recommend for large sets of data? So far I tried readxl and XLConnect.
openxlsx solution:
filename <-"myFilePath"
sheets <- openxlsx::getSheetNames(filename)
SheetList <- lapply(sheets,openxlsx::read.xlsx,xlsxFile=filename)
names(SheetList) <- sheets