In R, how do I compute mean and standard error of a subset of data, grouped by multiple columns, and output this into a new data frame?

PlantPathRules picture PlantPathRules · Dec 8, 2016 · Viewed 15.3k times · Source

I have a dataset (named 'gala') that has the columns "Day", "Tree", "Trt", and "LogColumn". The data was collected over time, so each numbered tree is the same tree for each treatment is the same across all days. The tree numbers are repeated for each treatment (e.g. there is a tree "1" for multiple treatments).

I would like to compute the mean and standard error for the 'LogColumn' column, for each tree per each treatment per each day (e.g. I will have a mean + standard error for day 1, tree one, treatment x, etc.), and output the mean and standard error results into a new data frame that also includes the original day, Tree, Trt values.

I have been unsuccessfully trying to make a Frankenstein of codes from other Stack Overflow answers, but I cannot seem to find one that has all the components at once. If I missed this, I am sorry, and please let me know with a link to this answer. I am new to coding, and R, and do not understand well how other codes not directly relating to what I would like to do can be applied.

At this point, I have this, but do not know if it is anywhere near correct (I am also currently getting the error message "object of type 'closure' is not subsettable"): TreeAverages <- data.table[, MeanLog=mean(gala$LogColumn), se=std.error(gala$LogColumn), by=c("Day","Tree","Trt")]

Any help is greatly appreciated. Thank you!

Answer

Lissa picture Lissa · Dec 8, 2016

If you're using data.table, remember to convert gala into a data.table object first.

gala = data.table(gala)

gala_output = gala[, .("MeanLog" = mean(LogColumn), 
         "std" = std.error(LogColumn)), 
     by = c("Day", "Tree", "Trt")]

You were really close, but data.table works like dplyr does, so it already knows variable names. You don't need to specify gala$LogColumn throughout, just do it by name.

.() is just a shorthand for list(), so I'm specifying that data.table should return the columns MeanLog and std grouped by Day, Tree, and Trt.