How to fill NAs with LOCF by factors in data frame, split by country

Question 1

How to fill NAs with LOCF by factors in data frame, split by country

r dataframe missing-data r-factor

rp1 · Nov 29, 2012 · Viewed 13.9k times · Source

Answer

Answer

A modern version of the ddply solution is to use the package dplyr:

library(dplyr)
DF %>%
  group_by(county) %>% 
  mutate(value = na.locf(value, na.rm = F))

Question 2

I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values:

country value
AUT     NA
AUT     5
AUT     NA
AUT     NA
GER     NA
GER     NA
GER     7
GER     NA
GER     NA

The following generates the above data frame:

data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA))

Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command na.locf in the zoo package. data <- na.locf(data) would give me the following data frame:

country value
AUT     NA
AUT     5
AUT     5
AUT     5
GER     5
GER     5
GER     7
GER     7
GER     7

However, the function should only be used on the individual subsets split by the country. The following is the output I would need:

country value
AUT     NA
AUT     5
AUT     5
AUT     5
GER     NA
GER     NA
GER     7
GER     7
GER     7

I can't think of an easy way to implement it. Before starting with for-loops, I was wondering if anyone has any idea as to how to solve this.

Many thanks!!

How to fill NAs with LOCF by factors in data frame, split by country

Answer

Related questions