I have a dataframe,
d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger",
"black panther", "short cat", "red bird",
"short bird stuffed", "big eagle", "bad sparrow",
"dog fish", "head dog", "brown yorkie",
"lab short bulldog"), label=1:14)
I'd like to search the name
column and if the words
"cat", "lion", "tiger", and "panther" appear, I want to assign the character string feline
to a new column and corresponding row species
.
If the words "bird", "eagle", and "sparrow"
appear, I want to assign the character string avian
to a new column and corresponding row species
.
If the words "dog", "yorkie", and "bulldog" appear, I want to assign the character string canine
to a new column and corresponding row species
.
Ideally, I'd store this in a list or something similar that I can keep at the beginning of the script, because as new variants of the species show up in the name category, it would be nice to have easy access to update what qualifies as a feline
, avian
, and canine
.
This question is almost answered here (How to create new column in dataframe based on partial string matching other column in R), but it doesn't address the multiple name twist that is present in this problem.
There may be a more elegant solution than this, but you could use grep
with |
to specify alternative matches.
d[grep("cat|lion|tiger|panther", d$name), "species"] <- "feline"
d[grep("bird|eagle|sparrow", d$name), "species"] <- "avian"
d[grep("dog|yorkie", d$name), "species"] <- "canine"
I've assumed you meant "avian", and left out "bulldog" since it contains "dog".
You might want to add ignore.case = TRUE
to the grep.
output:
# name label species
#1 brown cat 1 feline
#2 blue cat 2 feline
#3 big lion 3 feline
#4 tall tiger 4 feline
#5 black panther 5 feline
#6 short cat 6 feline
#7 red bird 7 avian
#8 short bird stuffed 8 avian
#9 big eagle 9 avian
#10 bad sparrow 10 avian
#11 dog fish 11 canine
#12 head dog 12 canine
#13 brown yorkie 13 canine
#14 lab short bulldog 14 canine