Alternatives to nested ifelse statements in R

Kat picture Kat · May 28, 2015 · Viewed 8k times · Source

Suppose we have the following data. The rows represent a country and the columns (in05:in09) indicate whether that country was present in a database of interest in the given year (2005:2009).

id <- c("a", "b", "c", "d")
in05 <- c(1, 0, 0, 1)
in06 <- c(0, 0, 0, 1)
in07 <- c(1, 1, 0, 1)
in08 <- c(0, 1, 1, 1)
in09 <- c(0, 0, 0, 1)
df <- data.frame(id, in05, in06, in07, in08, in09)

I want to create a variable firstyear which indicates the first year in which the country was present in the database. Right now I do the following:

df$firstyear <- ifelse(df$in05==1,2005,
    ifelse(df$in06==1,2006,
        ifelse(df$in07==1, 2007,
            ifelse(df$in08==1, 2008,
                ifelse(df$in09==1, 2009,
                    0)))))

The above code is already not very nice, and my dataset contains many more years. Is there an alternative, using *apply functions, loops or something else, to create this firstyear variable?

Answer

David Arenburg picture David Arenburg · May 28, 2015

You can vectorize using max.col

indx <- names(df)[max.col(df[-1], ties.method = "first") + 1L]
df$firstyear <- as.numeric(sub("in", "20", indx))
df
#   id in05 in06 in07 in08 in09 firstyear
# 1  a    1    0    1    0    0      2005
# 2  b    0    0    1    1    0      2007
# 3  c    0    0    0    1    0      2008
# 4  d    1    1    1    1    1      2005