I have a list of .stat files in tmp directory.
sample:
a.stat=>
abc,10
abc,20
abc,30
b.stat=>
xyz,10
xyz,30
xyz,70
and so on
I need to find summary of all .stat files.
Currently I am using
filelist<-list.files(path="/tmp/",pattern=".stat")
data<-sapply(paste("/tmp/",filelist,sep=''), read.csv, header=FALSE)
However I need to apply summary to all files being read. Or simply in n number of .stat files I need summary from 2nd column column
using
data<-sapply(paste("/tmp/",filelist,sep=''), summary, read.csv, header=FALSE)
does not work and gives me summary with class character, which is no what I intend.
sapply(filelist, function(filename){df <- read.csv(filename, header=F);print(summary(df[,2]))})
works fine. However my overall objective is to find values that are more than 2 standard deviations away on either side (outliers). So I use sd, but at the same time need to check if all values in the file currently read come under 2SD range.
To apply multiple functions at once:
f <- function(x){
list(sum(x),mean(x))
}
sapply(x, f)
In your case you want to apply them sequentially, so first read csv data then do summary:
sapply(lapply(paste("/tmp/",filelist,sep=''), read.csv), summary)
To subset your datasets to run summary on particular column you can use change outer sapply function from summary
to function(x) summary(x[[2]])
.