This is my Regex
((?:(?:'[^']*')|[^;])*)[;]
It tokenizes a string on semicolons. For example,
Hello world; I am having a problem; using regex;
Result is three strings
Hello world
I am having a problem
using regex
But when I use a large input string I get this error
Exception in thread "main" java.lang.StackOverflowError
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
How is this caused and how can I solve it?
Unfortunately, Java's builtin regex support has problems with regexes containing repetitive alternative paths (that is, (A|B)*
). This is compiled into a recursive call, which results in a StackOverflow error when used on a very large string.
A possible solution is to rewrite your regex to not use a repititive alternative, but if your goal is to tokenize a string on semicolons, you don't need a complex regex at all really, just use String.split() with a simple ";"
as the argument.