I want to match a regular expression special character, \^$.?*|+()[{
. I tried:
x <- "a[b"
grepl("[", x)
## Error: invalid regular expression '[', reason 'Missing ']''
(Equivalently stringr::str_detect(x, "[")
or stringi::stri_detect_regex(x, "[")
.)
Doubling the value to escape it doesn't work:
grepl("[[", x)
## Error: invalid regular expression '[[', reason 'Missing ']''
Neither does using a backslash:
grepl("\[", x)
## Error: '\[' is an unrecognized escape in character string starting ""\["
How do I match special characters?
Some special cases of this in questions that are old and well written enough for it to be cheeky to close as duplicates of this:
Escaped Periods In R Regular Expressions
How to escape a question mark in R?
escaping pipe ("|") in a regex
R treats backslashes as escape values for character constants. (... and so do regular expressions. Hence the need for two backslashes when supplying a character argument for a pattern. The first one isn't actually a character, but rather it makes the second one into a character.) You can see how they are processed using cat
.
y <- "double quote: \", tab: \t, newline: \n, unicode point: \u20AC"
print(y)
## [1] "double quote: \", tab: \t, newline: \n, unicode point: €"
cat(y)
## double quote: ", tab: , newline:
## , unicode point: €
Further reading: Escaping a backslash with a backslash in R produces 2 backslashes in a string, not 1
To use special characters in a regular expression the simplest method is usually to escape them with a backslash, but as noted above, the backslash itself needs to be escaped.
grepl("\\[", "a[b")
## [1] TRUE
To match backslashes, you need to double escape, resulting in four backslashes.
grepl("\\\\", c("a\\b", "a\nb"))
## [1] TRUE FALSE
The rebus
package contains constants for each of the special characters to save you mistyping slashes.
library(rebus)
OPEN_BRACKET
## [1] "\\["
BACKSLASH
## [1] "\\\\"
For more examples see:
?SpecialCharacters
Your problem can be solved this way:
library(rebus)
grepl(OPEN_BRACKET, "a[b")
You can also wrap the special characters in square brackets to form a character class.
grepl("[?]", "a?b")
## [1] TRUE
Two of the special characters have special meaning inside character classes: \
and ^
.
Backslash still needs to be escaped even if it is inside a character class.
grepl("[\\\\]", c("a\\b", "a\nb"))
## [1] TRUE FALSE
Caret only needs to be escaped if it is directly after the opening square bracket.
grepl("[ ^]", "a^b") # matches spaces as well.
## [1] TRUE
grepl("[\\^]", "a^b")
## [1] TRUE
rebus
also lets you form a character class.
char_class("?")
## <regex> [?]
If you want to match all punctuation, you can use the [:punct:]
character class.
grepl("[[:punct:]]", c("//", "[", "(", "{", "?", "^", "$"))
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
stringi
maps this to the Unicode General Category for punctuation, so its behaviour is slightly different.
stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "[[:punct:]]")
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
You can also use the cross-platform syntax for accessing a UGC.
stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "\\p{P}")
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE
Placing characters between \\Q
and \\E
makes the regular expression engine treat them literally rather than as regular expressions.
grepl("\\Q.\\E", "a.b")
## [1] TRUE
rebus
lets you write literal blocks of regular expressions.
literal(".")
## <regex> \Q.\E
Regular expressions are not always the answer. If you want to match a fixed string then you can do, for example:
grepl("[", "a[b", fixed = TRUE)
stringr::str_detect("a[b", fixed("["))
stringi::stri_detect_fixed("a[b", "[")