The following code/outcome baffles me as to why data.table returns NA for the mean functions and not the sd function.
library(data.table)
test <- data.frame('id'=c(1,2,3,4,5),
'A'=seq(2,9,length=5),
'B'=seq(3,9,length=5),
'C'=seq(4,9,length=5),
'D'=seq(5,9,length=5))
test <- as.data.table(test)
test[,`:=`(mean_test = mean(.SD), sd_test = sd(.SD)),by=id,.SDcols=c('A','B','C','D')]
> test
id A B C D mean_test sd_test
1: 1 2.00 3.0 4.00 5 NA 1.2909944
2: 2 3.75 4.5 5.25 6 NA 0.9682458
3: 3 5.50 6.0 6.50 7 NA 0.6454972
4: 4 7.25 7.5 7.75 8 NA 0.3227486
5: 5 9.00 9.0 9.00 9 NA 0.0000000
I've learned quite a bit searching around, going through the DT tutorials/examples. This question is very similar to what I was hoping to do.
Why does the standard deviation function work and the mean function return NA?
Edit: Using Ricardo Saporta's solution:
test[,`:=`(mean_test = apply(.SD, 1, mean), sd_test = apply(.SD, 1, sd),by=id,.SDcols=c('A','B','C','D')]
> test
id A B C D mean_test sd_test
1: 1 2.00 3.0 4.00 5 3.500 1.2909944
2: 2 3.75 4.5 5.25 6 4.875 0.9682458
3: 3 5.50 6.0 6.50 7 6.250 0.6454972
4: 4 7.25 7.5 7.75 8 7.625 0.3227486
5: 5 9.00 9.0 9.00 9 9.000 0.0000000
.SD
is itself a data.table
Thus, when you take mean(.SD)
you are (attempting) to take the mean of an entire data.table
The function mean()
does not know what to do with the data.table and returns NA
## the .SD in your question is the same as
test[, c('A','B','C','D')]
## try taking its mean
mean(test[, c('A','B','C','D')])
# Warning in mean.default(test[, c("A", "B", "C", "D")]) :
# argument is not numeric or logical: returning NA
# [1] NA
use lapply(.SD, mean)
for column-wise
or apply(.SD, 1, mean)
for row-wise