If else statements to check if a string contains a substring in R

Question 1

If else statements to check if a string contains a substring in R

r regex if-statement stringr

Carolin · Jun 12, 2018 · Viewed 14.9k times · Source

Answer

Answer

Assume you have a vector of characters, you can use stringr::str_extract for this purpose:

s <- c('A, C, D', 'P, O, E', 'W, E, W', 'S, B, W')
s
# [1] "A, C, D" "P, O, E" "W, E, W" "S, B, W"
stringr::str_extract(s, 'A|B')
# [1] "A" NA  NA  "B"

If a word match is preferred, use word boundaries \\b:

stringr::str_extract(s, '\\b(A|B)\\b')
# [1] "A" NA  NA  "B"

If substring is defined by ", ", you can use this regex (?<=^|, )(A|B)(?=,|$):

# use the test case from G.Grothendieck
stringr::str_extract(c("A.A, C", "D, B"), '(?<=^|, )(A|B)(?=,|$)')
# [1] NA  "B"

Question 2

I have a list that contains multiple strings for each observation (see below).

  [1] A, C, D 
  [2] P, O, E
  [3] W, E, W
  [4] S, B, W

I want to test if the strings contain certain substrings and if so, return the respective substring, in this example this would be either "A" or "B" (see desired outcome below). Each observation will only contain either one of the 2 substrings (A|B)

  [1] A 
  [2] NA
  [3] NA
  [4] B

No I have made this attempt in solving it, but it seems very inefficient and also I do not get it to work. How could I solve it?

  if (i == "A") {
    type <- "A"
  } else if { (i == "B") 
    type <- "B" 
  } else { type <- "NA"
  }

Note: I will need to loop it through > 1000 observations

If else statements to check if a string contains a substring in R

Answer

Related questions