What's wrong with my function to load multiple .csv files into single dataframe in R using rbind?

Sivanantham C picture Sivanantham C · Apr 21, 2014 · Viewed 35.9k times · Source

I have written the following function to combine 300 .csv files. My directory name is "specdata". I have done the following steps for execution,

x <- function(directory) {     
    dir <- directory    
    data_dir <- paste(getwd(),dir,sep = "/")    
    files  <- list.files(data_dir,pattern = '\\.csv')    
    tables <- lapply(paste(data_dir,files,sep = "/"), read.csv, header = TRUE)    
    pollutantmean <- do.call(rbind , tables)         
}

# Step 2: call the function
x("specdata")

# Step 3: inspect results
head(pollutantmean)

Error in head(pollutantmean) : object 'pollutantmean' not found

What is my mistake? Can anyone please explain?

Answer

hadley picture hadley · Apr 21, 2014

There's a lot of unnecessary code in your function. You can simplify it to:

load_data <- function(path) { 
  files <- dir(path, pattern = '\\.csv', full.names = TRUE)
  tables <- lapply(files, read.csv)
  do.call(rbind, tables)
}

pollutantmean <- load_data("specdata")

Be aware that do.call + rbind is relatively slow. You might find dplyr::bind_rows or data.table::rbindlist to be substantially faster.