Suppose we have the following data. The rows represent a country and the columns (in05:in09
) indicate whether that country was present in a database of interest in the given year (2005:2009
).
id <- c("a", "b", "c", "d")
in05 <- c(1, 0, 0, 1)
in06 <- c(0, 0, 0, 1)
in07 <- c(1, 1, 0, 1)
in08 <- c(0, 1, 1, 1)
in09 <- c(0, 0, 0, 1)
df <- data.frame(id, in05, in06, in07, in08, in09)
I want to create a variable firstyear
which indicates the first year in which the country was present in the database. Right now I do the following:
df$firstyear <- ifelse(df$in05==1,2005,
ifelse(df$in06==1,2006,
ifelse(df$in07==1, 2007,
ifelse(df$in08==1, 2008,
ifelse(df$in09==1, 2009,
0)))))
The above code is already not very nice, and my dataset contains many more years. Is there an alternative, using *apply
functions, loops or something else, to create this firstyear
variable?
You can vectorize using max.col
indx <- names(df)[max.col(df[-1], ties.method = "first") + 1L]
df$firstyear <- as.numeric(sub("in", "20", indx))
df
# id in05 in06 in07 in08 in09 firstyear
# 1 a 1 0 1 0 0 2005
# 2 b 0 0 1 1 0 2007
# 3 c 0 0 0 1 0 2008
# 4 d 1 1 1 1 1 2005