Match linebreaks - \n or \r\n?

KeyNone picture KeyNone · Nov 18, 2013 · Viewed 379.1k times · Source

While writing this answer, I had to match exclusively on linebreaks instead of using the s-flag (dotall - dot matches linebreaks).

The sites usually used to test regular expressions behave differently when trying to match on \n or \r\n.

I noticed

  • Regex101 matches linebreaks only on \n
    (example - delete \r and it matches)

  • RegExr matches linebreaks neither on \n nor on \r\n
    and I can't find something to make it match a linebreak, except for the m-flag and \s
    (example)

  • Debuggex behaves even more different:
    in this example it matches only on \r\n, while
    here it only matches on \n, with the same flags and engine specified

I'm fully aware of the m-flag (multiline - makes ^ match the start and $ the end of a line), but sometimes this is not an option. Same with \s, as it matches tabs and spaces, too.

My thought to use the unicode newline character (\u0085) wasn't successful, so:

  1. Is there a failsafe way to integrate the match on a linebreak (preferably regardless of the language used) into a regular expression?
  2. Why do the above mentioned sites behave differently (especially Debuggex, matching once only on \n and once only on \r\n)?

Answer

Peter van der Wal picture Peter van der Wal · Nov 18, 2013

Gonna answer in opposite direction.

2) For a full explanation about \r and \n I have to refer to this question, which is far more complete than I will post here: Difference between \n and \r?

Long story short, Linux uses \n for a new-line, Windows \r\n and old Macs \r. So there are multiple ways to write a newline. Your second tool (RegExr) does for example match on the single \r.

1) [\r\n]+ as Ilya suggested will work, but will also match multiple consecutive new-lines. (\r\n|\r|\n) is more correct.