is dash a special character in R regex?

RockScience picture RockScience · Jun 11, 2014 · Viewed 9k times · Source

Despite reading the help page of R regex

Finally, to include a literal -, place it first or last (or, for perl = TRUE only, precede it by a backslash).

I can't understand the difference between

grepl(pattern=paste("^thing1\\-",sep=""),x="thing1-thing2")

and

grepl(pattern=paste("^thing1-",sep=""),x="thing1-thing2")

Both return TRUE. Should I escape or not here? What is the best practice?

Answer

hwnd picture hwnd · Jun 11, 2014

The hyphen is mostly a normal character in regular expressions.

You do not need to escape the hyphen outside of a character class; it has no special meaning.

Within a character class [ ] you can place a hyphen as the first or last character in the range. If you place the hyphen anywhere else you need to escape it in order to add it to your class.

Examples:

grepl('^thing1-', x='thing1-thing2')
[1] TRUE
grepl('[-a-z]+', 'foo-bar')
[1] TRUE
grepl('[a-z-]+', 'foo-bar')
[1] TRUE
grepl('[a-z\\-\\d]+', 'foo-bar')
[1] TRUE

Note: It is more common to find a hyphen placed first or last within a character class.