Split a String at every 3rd comma in Java

knurb picture knurb · Jul 27, 2013 · Viewed 7.8k times · Source

I have a string that looks like this:

0,0,1,2,4,5,3,4,6

What I want returned is a String[] that was split after every 3rd comma, so the result would look like this:

[ "0,0,1", "2,4,5", "3,4,6" ]

I have found similar functions but they don't split at n-th amount of commas.

Answer

Pshemo picture Pshemo · Jul 27, 2013

You can try to use split method with (?<=\\G\\d+,\\d+,\\d+), regex

Demo

String data = "0,0,1,2,4,5,3,4,6";
String[] array = data.split("(?<=\\G\\d+,\\d+,\\d+),"); //Magic :) 
// to reveal magic see explanation below answer
for(String s : array){
    System.out.println(s);
}

output:

0,0,1
2,4,5
3,4,6

Explanation

  • \\d means one digit, same as [0-9], like 0 or 3
  • \\d+ means one or more digits like 1 or 23
  • \\d+, means one or more digits with comma after it, like 1, or 234,
  • \\d+,\\d+,\\d+ will accept three numbers with commas between them like 12,3,456
  • \\G means last match, or if there is none (in case of first usage) start of the string
  • (?<=...), is positive look-behind which will match comma , that has also some string described in (?<=...) before it
  • (?<=\\G\\d+,\\d+,\\d+), so will try to find comma that has three numbers before it, and these numbers have aether start of the string before it (like ^0,0,1 in your example) or previously matched comma, like 2,4,5 and 3,4,6.

Also in case you want to use other characters then digits you can also use other set of characters like

  • \\w which will match alphabetic characters, digits and _
  • \\S everything that is not white space
  • [^,] everything that is not comma
  • ... and so on. More info in Pattern documentation

By the way, this form will work with split on every 3rd, 5th, 7th, (and other odd numbers) comma, like split("(?<=\\G\\w+,\\w+,\\w+,\\w+,\\w+),") will split on every 5th comma.

To split on every 2nd, 4th, 6th, 8th (and rest of even numbers) comma you will need to replace + with {1,maxLengthOfNumber} like split("(?<=\\G\\w{1,3},\\w{1,3},\\w{1,3},\\w{1,3}),") to split on every 4th comma when numbers can have max 3 digits (0, 00, 12, 000, 123, 412, 999).

To split on every 2nd comma you can also use this regex split("(?<!\\G\\d+),") based on my previous answer