Create new column in dataframe based on partial string matching other column

Question 1

Create new column in dataframe based on partial string matching other column

regex string r match partial

user2948714 · Nov 2, 2013 · Viewed 23.8k times · Source

Answer

Answer

Since you have only two conditions, you can use a nested ifelse:

#random data; it wasn't easy to copy-paste yours  
DF <- data.frame(GL = sample(10), GLDESC = paste(sample(letters, 10), 
  c("gas", "payroll12", "GaSer", "asdf", "qweaa", "PayROll-12", 
     "asdfg", "GAS--2", "fghfgh", "qweee"), sample(letters, 10), sep = " "))

DF$KIND <- ifelse(grepl("gas", DF$GLDESC, ignore.case = T), "Materials", 
         ifelse(grepl("payroll", DF$GLDESC, ignore.case = T), "Payroll", "Other"))

DF
#   GL         GLDESC      KIND
#1   8        e gas l Materials
#2   1  c payroll12 y   Payroll
#3  10      m GaSer v Materials
#4   6       t asdf n     Other
#5   2      w qweaa t     Other
#6   4 r PayROll-12 q   Payroll
#7   9      n asdfg a     Other
#8   5     d GAS--2 w Materials
#9   7     s fghfgh e     Other
#10  3      g qweee k     Other

EDIT 10/3/2016 (..after receiving more attention than expected)

A possible solution to deal with more patterns could be to iterate over all patterns and, whenever there is match, progressively reduce the amount of comparisons:

ff = function(x, patterns, replacements = patterns, fill = NA, ...)
{
    stopifnot(length(patterns) == length(replacements))

    ans = rep_len(as.character(fill), length(x))    
    empty = seq_along(x)

    for(i in seq_along(patterns)) {
        greps = grepl(patterns[[i]], x[empty], ...)
        ans[empty[greps]] = replacements[[i]]  
        empty = empty[!greps]
    }

    return(ans)
}

ff(DF$GLDESC, c("gas", "payroll"), c("Materials", "Payroll"), "Other", ignore.case = TRUE)
# [1] "Materials" "Payroll"   "Materials" "Other"     "Other"     "Payroll"   "Other"     "Materials" "Other"     "Other"

ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"), 
   c("pat1a|pat1b", "pat2", "pat3"), 
   c("1", "2", "3"), fill = "empty")
#[1] "1"     "1"     "3"     "empty"

ff(c("pat1a pat2", "pat1a pat1b", "pat3", "pat4"), 
   c("pat2", "pat1a|pat1b", "pat3"), 
   c("2", "1", "3"), fill = "empty")
#[1] "2"     "1"     "3"     "empty"

Question 2

I have a dataframe with 2 columns GL and GLDESC and want to add a 3rd column called KIND based on some data that is inside of column GLDESC.

The dataframe is as follows:

      GL                             GLDESC
1 515100         Payroll-Indir Salary Labor
2 515900 Payroll-Indir Compensated Absences
3 532300                           Bulk Gas
4 539991                     Area Charge In
5 551000        Repairs & Maint-Spare Parts
6 551100                 Supplies-Operating
7 551300                        Consumables

For each row of the data table:

If GLDESC contains the word Payroll anywhere in the string then I want KIND to be Payroll
If GLDESC contains the word Gas anywhere in the string then I want KIND to be Materials
In all other cases I want KIND to be Other

I looked for similar examples on stackoverflow but could not find any, also looked in R for dummies on switch, grep, apply and regular expressions to try and match only part of the GLDESC column and then fill the KIND column with the kind of account but was unable to make it work.

Create new column in dataframe based on partial string matching other column

Answer

Related questions