Edit -- This question was originally titled << Long to wide data reshaping in R >>
I'm just learning R and trying to find ways to apply it to help out others in my life. As a test case, I'm working on reshaping some data, and I'm having trouble following the examples I've found online. What I'm starting with looks like this:
ID Obs 1 Obs 2 Obs 3
1 43 48 37
1 27 29 22
1 36 32 40
2 33 38 36
2 29 32 27
2 32 31 35
2 25 28 24
3 45 47 42
3 38 40 36
And what I want to end up with will look like this:
ID Obs 1 mean Obs 1 std dev Obs 2 mean Obs 2 std dev
1 x x x x
2 x x x x
3 x x x x
And so forth. What I'm unsure of is whether I need additional information in my long-form data, or what. I imagine that the math part (finding the mean and standard deviations) will be the easy part, but I haven't been able to find a way that seems to work to reshape the data correctly to start in on that process.
Thanks very much for any help.
This is an aggregation problem, not a reshaping problem as the question originally suggested -- we wish to aggregate each column into a mean and standard deviation by ID. There are many packages that handle such problems. In the base of R it can be done using aggregate
like this (assuming DF
is the input data frame):
ag <- aggregate(. ~ ID, DF, function(x) c(mean = mean(x), sd = sd(x)))
Note 1: A commenter pointed out that ag
is a data frame for which some columns are matrices. Although initially that may seem strange, in fact it simplifies access. ag
has the same number of columns as the input DF
. Its first column ag[[1]]
is ID
and the ith column of the remainder ag[[i+1]]
(or equivalanetly ag[-1][[i]]
) is the matrix of statistics for the ith input observation column. If one wishes to access the jth statistic of the ith observation it is therefore ag[[i+1]][, j]
which can also be written as ag[-1][[i]][, j]
.
On the other hand, suppose there are k
statistic columns for each observation in the input (where k=2 in the question). Then if we flatten the output then to access the jth statistic of the ith observation column we must use the more complex ag[[k*(i-1)+j+1]]
or equivalently ag[-1][[k*(i-1)+j]]
.
For example, compare the simplicity of the first expression vs. the second:
ag[-1][[2]]
## mean sd
## [1,] 36.333 10.2144
## [2,] 32.250 4.1932
## [3,] 43.500 4.9497
ag_flat <- do.call("data.frame", ag) # flatten
ag_flat[-1][, 2 * (2-1) + 1:2]
## Obs_2.mean Obs_2.sd
## 1 36.333 10.2144
## 2 32.250 4.1932
## 3 43.500 4.9497
Note 2: The input in reproducible form is:
Lines <- "ID Obs_1 Obs_2 Obs_3
1 43 48 37
1 27 29 22
1 36 32 40
2 33 38 36
2 29 32 27
2 32 31 35
2 25 28 24
3 45 47 42
3 38 40 36"
DF <- read.table(text = Lines, header = TRUE)