doing a plyr operation on every row of a data frame in R

JD Long picture JD Long · Jan 15, 2010 · Viewed 11.3k times · Source

I like the plyr syntax. Any time I have to use one of the *apply() commands I end up kicking the dog and going on a 3 day bender. So for the sake of my dog and my liver, what's concise syntax for doing a ddply operation on every row of a data frame?

Here's an example that works well for a simple case:

x <- rnorm(10)
y <- rnorm(10)
df <- data.frame(x,y)
ddply(df,names(df) ,function(df) max(df$x,df$y))

that works fine and gives me what I want. But if things get more complex this causes plyr to get funky (and not like Bootsy Collins) because plyr is chewing on making "levels" out of all those floating point values

x <- rnorm(1000)
y <- rnorm(1000)
z <- rnorm(1000)
myLetters <- sample(letters, 1000, replace=T)
df <- data.frame(x,y, z, myLetters)
ddply(df,names(df) ,function(df) max(df$x,df$y))

on my box this chews for a few minutes and then returns:

Error: memory exhausted (limit reached?)
In addition: Warning messages:
1: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :
  Reached total allocation of 1535Mb: see help(memory.size)

I think I am totally abusing plyr and I am not saying this is a bug in plyr, but rather abusive behavior by me (liver and dog notwithstanding).

So in short, is there syntax shortcut for using ddply to operate on every row as a substitute for apply(X, 1, ...)?

The workaround I've been using is to create a "key" that gives a unique value for every row and then I can join back to it.

 x <- rnorm(1000)
 y <- rnorm(1000)
 z <- rnorm(1000)
 myLetters <- sample(letters, 1000, replace=T)
 df <- data.frame(x,y, z, myLetters)
  #make the key
 df$myKey <- 1:nrow(df)
 myOut <- merge(df, ddply(df,"myKey" ,function(df) max(df$x,df$y)))
  #knock out the key
 myOut$myKey <- NULL

But I keep thinking that "There Has to Be a Better Way"

Thanks!

Answer

hadley picture hadley · Jan 15, 2010

Just treat it like an array and work on each row:

adply(df, 1, transform, max = max(x, y))