How to determine if a number is a prime with regex?

kitlite  picture kitlite · May 8, 2010 · Viewed 29.3k times · Source

I found the following code example for Java on RosettaCode:

public static boolean prime(int n) {
  return !new String(new char[n]).matches(".?|(..+?)\\1+");
}
  • I don't know Java in particular but understand all aspects of this snippet except for the regex itself
  • I have basic to basic-advanced knowledge of Regex as you find it in the built-in PHP functions

How does .?|(..+?)\\1+ match prime numbers?

Answer

Platinum Azure picture Platinum Azure · May 8, 2010

You said you understand this part, but just to emphasize, the String generated has a length equal to the number supplied. So the string has three characters if and only if n == 3.

.?

The first part of the regex says, "any character, zero or one times". So basically, is there zero or one character-- or, per what I mentioned above, n == 0 || n == 1. If we have the match, then return the negation of that. This corresponds with the fact that zero and one are NOT prime.

(..+?)\\1+

The second part of the regex is a little trickier, relying on groups and backreferences. A group is anything in parentheses, which will then be captured and stored by the regex engine for later use. A backreference is a matched group that is used later on in the same regex.

The group captures 1 character, then 1 or more of any character. (The + character means one or more, but ONLY of the previous character or group. So this is not "two or four or six etc. characters", but rather "two or three etc." The +? is like +, but it tries to match as few characters as possible. + normally tries to gobble the whole string if it can, which is bad in this case because it prevents the backreference part from working.)

The next part is the backreference: That same set of characters (two or more), appearing again. Said backreference appears one or more times.

So. The captured group corresponds to a natural number of characters (from 2 onward) captured. Said group then appears some natural number of times (also from 2 onward). If there IS a match, this implies that it's possible to find a product of two numbers greater than or equal to 2 that match the n-length string... meaning you have a composite n. So again, return the negation of the successful match: n is NOT prime.

If no match can be found, then you can't come up with a your product of two natural numbers greater than or equal to 2... and you have both a non-match and a prime, hence again the returning of the negation of the match result.

Do you see it now? It's unbelievably tricky (and computationally expensive!) but it's also kind of simple at the same time, once you get it. :-)

I can elaborate if you have further questions, like on how regex parsing actually works. But I'm trying to keep this answer simple for now (or as simple as it can ever be).