Java - Split String by Number and Letters

Azazel picture Azazel · Apr 5, 2016 · Viewed 18.2k times · Source

So I have, for example, a string such as this C3H20IO

What I wanna do is split this string so I get the following:

Array1 = {C,H,I,O}
Array2 = {3,20,1,1}

The 1 as the third element of the Array2 is indicative of the monoatomic nature of the I element. Same for O. That is actually the part I am struggling with.

This is a chemical equation, so I need to separate the elements according to their names and the amount of atoms there are etc.

Answer

Maljam picture Maljam · Apr 5, 2016

You could try this approach:

String formula = "C3H20IO";

//insert "1" in atom-atom boundry 
formula = formula.replaceAll("(?<=[A-Z])(?=[A-Z])|(?<=[a-z])(?=[A-Z])|(?<=\\D)$", "1");

//split at letter-digit or digit-letter boundry
String regex = "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)";
String[] atoms = formula.split(regex);

Output:

atoms: [C, 3, H, 20, I, 1, O, 1]

Now all even even indices (0, 2, 4...) are atoms and odd ones are the associated number:

String[] a = new String[ atoms.length/2 ];
int[] n = new int[ atoms.length/2 ];

for(int i = 0 ; i < a.length ; i++) {
    a[i] = atoms[i*2];
    n[i] = Integer.parseInt(atoms[i*2+1]);
}

Output:

a: [C, H, I, O]
n: [3, 20, 1, 1]