std::regex, to match begin/end of string

c-smile picture c-smile · Sep 22, 2016 · Viewed 10.6k times · Source

In JS regular expressions symbols ^ and $ designate start and end of the string. And only with /m modifier (multiline mode) they match start and end of line - position before and after CR/LF.

But in std::regex/ECMAscript mode symbols ^ and $ match start and end of line always.

Is there any way in std::regex to define start and end of the string match points? In other words: to support JavaScript multiline mode ...

Answer

ildjarn picture ildjarn · Sep 22, 2016

By default, ECMAscript mode already treats ^ as both beginning-of-input and beginning-of-line, and $ as both end-of-input and end-of-line. There is no way to make them match only beginning or end-of-input, but it is possible to make them match only beginning or end-of-line:

When invoking std::regex_match, std::regex_search, or std::regex_replace, there is an argument of type std::regex_constants::match_flag_type that defaults to std::regex_constants::match_default.

  • To specify that ^ matches only beginning-of-line, specify std::regex_constants::match_not_bol
  • To specify that $ matches only end-of-line, specify std::regex_constants::match_not_eol
  • As these values are bitflags, to specify both, simply bitwise-or them together (std::regex_constants::match_not_bol | std::regex_constants::match_not_eol)
  • Note that beginning-of-input can be implied without using ^ and regardless of the presence of std::regex_constants::match_not_bol by specifying std::regex_constants::match_continuous

This is explained well in the ECMAScript grammar documentation on cppreference.com, which I highly recommend over cplusplus.com in general.

Caveat: I've tested with MSVC, Clang + libc++, and Clang + libstdc++, and only MSVC has the correct behavior at present.