I am trying to replace some missing values in my data with the average values from a similar group.
My data looks like this:
X Y
1 x y
2 x y
3 NA y
4 x y
And I want it to look like this:
X Y
1 x y
2 x y
3 y y
4 x y
I wrote this, and it worked
for(i in 1:nrow(data.frame){
if( is.na(data.frame$X[i]) == TRUE){
data.frame$X[i] <- data.frame$Y[i]
}
}
But my data.frame is almost half a million lines long, and the for/if statements are pretty slow. What I want is something like
is.na(data.frame$X) <- data.frame$Y
But this gets a mismatched size error. It seems like there should be a command that does this, but I cannot find it here on SO or on the R help list. Any ideas?
ifelse
is your friend.
Using Dirk's dataset
df <- within(df, X <- ifelse(is.na(X), Y, X))