Replace values in a dataframe based on lookup table

jbunk picture jbunk · Feb 25, 2016 · Viewed 53.1k times · Source

I am having some trouble replacing values in a dataframe. I would like to replace values based on a separate table. Below is an example of what I am trying to do.

I have a table where every row is a customer and every column is an animal they purchased. Lets call this dataframe table.

> table
#       P1     P2     P3
# 1    cat lizard parrot
# 2 lizard parrot    cat
# 3 parrot    cat lizard

I also have a table that I will reference called lookUp.

> lookUp
#      pet   class
# 1    cat  mammal
# 2 lizard reptile
# 3 parrot    bird

What I want to do is create a new table called new with a function replaces all values in table with the class column in lookUp. I tried this myself using an lapply function, but I got the following warnings.

new <- as.data.frame(lapply(table, function(x) {
  gsub('.*', lookUp[match(x, lookUp$pet) ,2], x)}), stringsAsFactors = FALSE)

Warning messages:
1: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
  argument 'replacement' has length > 1 and only the first element will be used
2: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
  argument 'replacement' has length > 1 and only the first element will be used
3: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
  argument 'replacement' has length > 1 and only the first element will be used

Any ideas on how to make this work?

Answer

talat picture talat · Feb 25, 2016

You posted an approach in your question which was not bad. Here's a smiliar approach:

new <- df  # create a copy of df
# using lapply, loop over columns and match values to the look up table. store in "new".
new[] <- lapply(df, function(x) look$class[match(x, look$pet)])

An alternative approach which will be faster is:

new <- df
new[] <- look$class[match(unlist(df), look$pet)]

Note that I use empty brackets ([]) in both cases to keep the structure of new as it was (a data.frame).

(I'm using df instead of table and look instead of lookup in my answer)