I just wrote a regex for use with the php function preg_match
that contains the following part:
[\w-.]
To match any word character, as well as a minus sign and the dot. While it seems to work in preg_match, I tried to put it into a utility called Reggy and it complaints about "Empty range in char class". Trial and error taught me that this issue was solved by escaping the minus sign, turning the regex into
[\w\-.]
Since the original appears to work in PHP, I am wondering why I should or should not be escaping the minus sign, and - since the dot is also a character with a meaning in PHP - why I would not need to escape the dot. Is the utility I am using just being silly, is it working with another regex dialect or is my regex really incorrect and am I just lucky that preg_match lets me get away with it?
In many regex implementations, the following rules apply:
Meta characters inside a character class are:
^
(negation)-
(range)]
(end of the class)\
(escape char)So these should all be escaped. There are some corner cases though:
-
needs no escaping if placed at the very start, or end of the class ([abc-]
or [-abc]
). In quite a few regex implementations, it also needs no escaping when placed directly after a range ([a-c-abc]
) or short-hand character class ([\w-abc]
). This is what you observed^
needs no escaping when it's not at the start of the class: [^a]
means any char except a
, and [a^]
matches either a
or ^
, which equals: [\^a]
]
needs no escaping if it's the only character in the class: []]
matches the char ]