can the value.var in dcast be a list or have multiple value variables?

AlexR picture AlexR · Apr 14, 2014 · Viewed 37k times · Source

In the help files for dcast.data.table, there is a note stating that a new feature has been implemented: "dcast.data.table allows value.var column to be of type list"

I take this to mean that one can have multiple value variables within a list, i.e. in this format:

dcast.data.table(dt, x1~x2, value.var=list('var1','var2','var3'))

But we get an error: 'value.var' must be a character vector of length 1.

Is there such a feature, and if not, what would be other one-liner alternatives?

EDIT: In reply to the comments below

There are situations where you have multiple variables that you want to treat as the value.var. Imagine for example that x2 consists of 3 different weeks, and you have 2 value variables such as salt and sugar consumption and you want to cast those variables across the different weeks. Sure, you can 'melt' the 2 value variables into a single column, but why do something using two functions, when you can do it in one function like reshape does?

(Note: I've also noticed that reshape cannot treat multiple variables as the time variable as dcast does.)

So my point is that I don't understand why these functions don't allow for the flexibility to include multiple variables within the value.var or the time.var just as we allow for multiple variables for the id.var.

Answer

Arun picture Arun · Mar 16, 2015

From v1.9.6 of data.table, we can cast multiple value.var columns simultaneously (and also use multiple aggregation functions in fun.aggregate). Please see ?dcast and the Efficient reshaping using data.tables vignette for more.

Here's how we could use dcast:

dcast(setDT(mydf), x1 ~ x2, value.var=c("salt", "sugar"))
#    x1 salt_1 salt_2 salt_3 sugar_1 sugar_2 sugar_3
# 1:  1      3      4      6       1       2       2
# 2:  2     10      3      9       5       3       6
# 3:  3     10      7      7       4       6       7