I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values:
country value
AUT NA
AUT 5
AUT NA
AUT NA
GER NA
GER NA
GER 7
GER NA
GER NA
The following generates the above data frame:
data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA))
Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command na.locf
in the zoo package. data <- na.locf(data)
would give me the following data frame:
country value
AUT NA
AUT 5
AUT 5
AUT 5
GER 5
GER 5
GER 7
GER 7
GER 7
However, the function should only be used on the individual subsets split by the country. The following is the output I would need:
country value
AUT NA
AUT 5
AUT 5
AUT 5
GER NA
GER NA
GER 7
GER 7
GER 7
I can't think of an easy way to implement it. Before starting with for-loops, I was wondering if anyone has any idea as to how to solve this.
Many thanks!!
A modern version of the ddply
solution is to use the package dplyr
:
library(dplyr)
DF %>%
group_by(county) %>%
mutate(value = na.locf(value, na.rm = F))