question related to this
I have a string
a\;b\\;c;d
which in Java looks like
String s = "a\\;b\\\\;c;d"
I need to split it by semicolon with following rules:
If semicolon is preceded by backslash, it should not be treated as separator (between a and b).
If backslash itself is escaped and therefore does not escape itself semicolon, that semicolon should be separator (between b and c).
So semicolon should be treated as separator if there is either zero or even number of backslashes before it.
For example above, I want to get following strings (double backslashes for java compiler):
a\;b\\
c
d
You can use the regex
(?:\\.|[^;\\]++)*
to match all text between unescaped semicolons:
List<String> matchList = new ArrayList<String>();
try {
Pattern regex = Pattern.compile("(?:\\\\.|[^;\\\\]++)*");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
Explanation:
(?: # Match either...
\\. # any escaped character
| # or...
[^;\\]++ # any character(s) except semicolon or backslash; possessive match
)* # Repeat any number of times.
The possessive match (++
) is important to avoid catastrophic backtracking because of the nested quantifiers.