Can I define a "fill" value for NA in dplyr join? For example in the join define that all NA values should be 1?
require(dplyr)
lookup <- data.frame(cbind(c("USD","MYR"),c(0.9,1.1)))
names(lookup) <- c("rate","value")
fx <- data.frame(c("USD","MYR","USD","MYR","XXX","YYY"))
names(fx)[1] <- "rate"
left_join(x=fx,y=lookup,by=c("rate"))
Above code will create NA for values "XXX" and "YYY". In my case I am joining a large number of columns and there will be a lot of non-matches. All non-matches should have the same value. I know I can do it in several steps but the question is can all be done in one? Thanks!
First off, I would like to recommend not to use the combination data.frame(cbind(...))
. Here's why: cbind
creates a matrix
by default if you only pass atomic vectors to it. And matrices in R can only have one type of data (think of matrices as a vector with dimension attribute, i.e. number of rows and columns). Therefore, your code
cbind(c("USD","MYR"),c(0.9,1.1))
creates a character matrix:
str(cbind(c("USD","MYR"),c(0.9,1.1)))
# chr [1:2, 1:2] "USD" "MYR" "0.9" "1.1"
although you probably expected a final data frame with a character or factor column (rate) and a numeric column (value). But what you get is:
str(data.frame(cbind(c("USD","MYR"),c(0.9,1.1))))
#'data.frame': 2 obs. of 2 variables:
# $ X1: Factor w/ 2 levels "MYR","USD": 2 1
# $ X2: Factor w/ 2 levels "0.9","1.1": 1 2
because strings (characters) are converted to factors when using data.frame
by default (You can circumvent this by specifying stringsAsFactors = FALSE
in the data.frame()
call).
I suggest the following alternative approach to create the sample data (also note that you can easily specify the column names in the same call):
lookup <- data.frame(rate = c("USD","MYR"),
value = c(0.9,1.1))
fx <- data.frame(rate = c("USD","MYR","USD","MYR","XXX","YYY"))
Now, for you actual question, if I understand correctly, you want to replace all NA
s with a 1
in the joined data. If that's correct, here's a custom function using left_join
and mutate_each
to do that:
library(dplyr)
left_join_NA <- function(x, y, ...) {
left_join(x = x, y = y, by = ...) %>%
mutate_each(funs(replace(., which(is.na(.)), 1)))
}
Now you can apply it to your data like this:
> left_join_NA(x = fx, y = lookup, by = "rate")
# rate value
#1 USD 0.9
#2 MYR 1.1
#3 USD 0.9
#4 MYR 1.1
#5 XXX 1.0
#6 YYY 1.0
#Warning message:
#joining factors with different levels, coercing to character vector
Note that you end up with a character column (rate) and a numeric column (value) and all NAs are replaced by 1.
str(left_join_NA(x = fx, y = lookup, by = "rate"))
#'data.frame': 6 obs. of 2 variables:
# $ rate : chr "USD" "MYR" "USD" "MYR" ...
# $ value: num 0.9 1.1 0.9 1.1 1 1