RegEx to split camelCase or TitleCase (advanced)

Question 1

RegEx to split camelCase or TitleCase (advanced)

java regex camelcasing title-case

Jmini · Sep 29, 2011 · Viewed 51k times · Source

Answer

Answer

The following regex works for all of the above examples:

public static void main(String[] args)
{
    for (String w : "camelValue".split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])")) {
        System.out.println(w);
    }
}

It works by forcing the negative lookbehind to not only ignore matches at the start of the string, but to also ignore matches where a capital letter is preceded by another capital letter. This handles cases like "VALUE".

The first part of the regex on its own fails on "eclipseRCPExt" by failing to split between "RPC" and "Ext". This is the purpose of the second clause: (?<!^)(?=[A-Z][a-z]. This clause allows a split before every capital letter that is followed by a lowercase letter, except at the start of the string.

Question 2

I found a brilliant RegEx to extract the part of a camelCase or TitleCase expression.

 (?<!^)(?=[A-Z])

It works as expected:

value -> value
camelValue -> camel / Value
TitleValue -> Title / Value

For example with Java:

String s = "loremIpsum";
words = s.split("(?<!^)(?=[A-Z])");
//words equals words = new String[]{"lorem","Ipsum"}

My problem is that it does not work in some cases:

Case 1: VALUE -> V / A / L / U / E
Case 2: eclipseRCPExt -> eclipse / R / C / P / Ext

To my mind, the result shoud be:

Case 1: VALUE
Case 2: eclipse / RCP / Ext

In other words, given n uppercase chars:

if the n chars are followed by lower case chars, the groups should be: (n-1 chars) / (n-th char + lower chars)
if the n chars are at the end, the group should be: (n chars).

Any idea on how to improve this regex?

RegEx to split camelCase or TitleCase (advanced)

Answer

Related questions