Reshape three column data frame to matrix ("long" to "wide" format)

MalteseUnderdog picture MalteseUnderdog · Mar 8, 2012 · Viewed 81.6k times · Source

I have a data.frame that looks like this.

x a 1 
x b 2 
x c 3 
y a 3 
y b 3 
y c 2 

I want this in matrix form so I can feed it to heatmap to make a plot. The result should look something like:

    a    b    c
x   1    2    3
y   3    3    2

I have tried cast from the reshape package and I have tried writing a manual function to do this but I do not seem to be able to get it right.

Answer

Aaron left Stack Overflow picture Aaron left Stack Overflow · Mar 8, 2012

There are many ways to do this. This answer starts with what is quickly becoming the standard method, but also includes older methods and various other methods from answers to similar questions scattered around this site.

tmp <- data.frame(x=gl(2,3, labels=letters[24:25]),
                  y=gl(3,1,6, labels=letters[1:3]), 
                  z=c(1,2,3,3,3,2))

Using the tidyverse:

The new cool new way to do this is with pivot_wider from tidyr 1.0.0. It returns a data frame, which is probably what most readers of this answer will want. For a heatmap, though, you would need to convert this to a true matrix.

library(tidyr)
pivot_wider(tmp, names_from = y, values_from = z)
## # A tibble: 2 x 4
## x         a     b     c
## <fct> <dbl> <dbl> <dbl>
## 1 x       1     2     3
## 2 y       3     3     2

The old cool new way to do this is with spread from tidyr. It similarly returns a data frame.

library(tidyr)
spread(tmp, y, z)
##   x a b c
## 1 x 1 2 3
## 2 y 3 3 2

Using reshape2:

One of the first steps toward the tidyverse was the reshape2 package.

To get a matrix use acast:

library(reshape2)
acast(tmp, x~y, value.var="z")
##   a b c
## x 1 2 3
## y 3 3 2

Or to get a data frame, use dcast, as here: Reshape data for values in one column.

dcast(tmp, x~y, value.var="z")
##   x a b c
## 1 x 1 2 3
## 2 y 3 3 2

Using plyr:

In between reshape2 and the tidyverse came plyr, with the daply function, as shown here: https://stackoverflow.com/a/7020101/210673

library(plyr)
daply(tmp, .(x, y), function(x) x$z)
##    y
## x   a b c
##   x 1 2 3
##   y 3 3 2

Using matrix indexing:

This is kinda old school but is a nice demonstration of matrix indexing, which can be really useful in certain situations.

with(tmp, {
  out <- matrix(nrow=nlevels(x), ncol=nlevels(y),
                dimnames=list(levels(x), levels(y)))
  out[cbind(x, y)] <- z
  out
})

Using xtabs:

xtabs(z~x+y, data=tmp)

Using a sparse matrix:

There's also sparseMatrix within the Matrix package, as seen here: R - convert BIG table into matrix by column names

with(tmp, sparseMatrix(i = as.numeric(x), j=as.numeric(y), x=z,
                       dimnames=list(levels(x), levels(y))))
## 2 x 3 sparse Matrix of class "dgCMatrix"
##   a b c
## x 1 2 3
## y 3 3 2

Using reshape:

You can also use the base R function reshape, as suggested here: Convert table into matrix by column names, though you have to do a little manipulation afterwards to remove an extra columns and get the names right (not shown).

reshape(tmp, idvar="x", timevar="y", direction="wide")
##   x z.a z.b z.c
## 1 x   1   2   3
## 4 y   3   3   2