I have a data frame with about 200 columns, out of them I want to group the table by first 10 or so which are factors and sum the rest of the columns.
I have list of all the column names which I want to group by and the list of all the cols which I want to aggregate.
The output format that I am looking for needs to be the same dataframe with same number of cols, just grouped together.
Is there a solution using packages data.table
, plyr
or any other?
The data.table way is :
DT[, lapply(.SD,sum), by=list(col1,col2,col3,...)]
or
DT[, lapply(.SD,sum), by=colnames(DT)[1:10]]
where .SD
is the (S)ubset of (D)ata excluding group columns. (Aside: If you need to refer to group columns generically, they are in .BY
.)