I'm new to R and I'm trying to sum 2 columns of a given dataframe, if both the elements to be summed satisfy a given condition. To make things clear, what I want to do is:
> t.d<-as.data.frame(matrix(1:9,ncol=3))
> t.d
V1 V2 V3
1 4 7
2 5 8
3 6 9
> t.d$V4<-rep(0,nrow(t.d))
> for (i in 1:nrow(t.d)){
+ if (t.d$V1[i]>1 && t.d$V3[i]<9){
+ t.d$V4[i]<-t.d$V1[i]+t.d$V3[i]}
+ }
> t.d
V1 V2 V3 V4
1 4 7 0
2 5 8 10
3 6 9 0
I need an efficient code, as my real dataframe has about 150000 rows and 200 columns. This gives an error:
t.d$V4<-t.d$V1[t.d$V1>1]+ t.d$V3[t.d$V3>9]
Is "apply" an option? I tried this:
t.d<-as.data.frame(matrix(1:9,ncol=3))
t.d$V4<-rep(0,nrow(t.d))
my.fun<-function(x,y){
if(x>1 && y<9){
x+y}
}
t.d$V4<-apply(X=t.d,MAR=1,FUN=my.fun,x=t.d$V1,y=t.d$V3)
but it gives an error as well. Thanks very much for your help.
This operation doesn't require loops, apply statements or if statements. Vectorised operations and subsetting is all you need:
t.d <- within(t.d, V4 <- V1 + V3)
t.d[!(t.d$V1>1 & t.d$V3<9), "V4"] <- 0
t.d
V1 V2 V3 V4
1 1 4 7 0
2 2 5 8 10
3 3 6 9 0
Why does this work?
In the first step I create a new column that is the straight sum of columns V1 and V4. I use within
as a convenient way of referring to the columns of d.f
without having to write d.f$V
all the time.
In the second step I subset all of the rows that don't fulfill your conditions and set V4 for these to 0.