Is there an elegant way to balance an unbalanced panel data set? I would like to start with an unbalanced panel (ie, some individuals are missing some data) and end up with a balanced panel (ie, all individuals are missing no data). Below is some sample code. The correct end result is for all observations on 'Frank' and 'Edward' to remain and for all observations on 'Tony' to be removed since he has some missing data. Thank you.
unbal <- data.frame(PERSON=c(rep('Frank',5),rep('Tony',5),rep('Edward',5)), YEAR=c(2001,2002,2003,2004,2005,2001,2002,2003,2004,2005,2001,2002,2003,2004,2005), Y=c(21,22,23,24,25,5,6,NA,7,8,31,32,33,34,35), X=c(1:15))
unbal
One way to balance a panel is to remove individuals with incomplete data, another way is to fill in a value, such as NA
or 0
for the missing observations. For the first approach, you can use complete.cases
to find rows that have no NA
in them. Then you can find all the PERSON
with at least one missing case.
missing.at.least.one <- unique(unbal$PERSON[!complete.cases(unbal)])
unbal[!(unbal$PERSON %in% missing.at.least.one),]
# PERSON YEAR Y X
# 1 Frank 2001 21 1
# 2 Frank 2002 22 2
# 3 Frank 2003 23 3
# 4 Frank 2004 24 4
# 5 Frank 2005 25 5
# 11 Edward 2001 31 11
# 12 Edward 2002 32 12
# 13 Edward 2003 33 13
# 14 Edward 2004 34 14
# 15 Edward 2005 35 15