How to find the longest substring containing two unique repeating characters

40mikemike picture 40mikemike · Feb 21, 2013 · Viewed 12.1k times · Source

The task is to find the longest substring in a given string that is composed of any two unique repeating characters
Ex. in an input string "aabadefghaabbaagad", the longest such string is "aabbaa"

I came up with the following solution but wanted to see if there is a more efficient way to do the same

import java.util.*; 

public class SubString {
    public static void main(String[] args) { 
    //String inStr="defghgadaaaaabaababbbbbbd";
    String inStr="aabadefghaabbaagad";
    //String inStr="aaaaaaaaaaaaaaaaaaaa";
    System.out.println("Input string is         "+inStr);

    StringBuilder sb = new StringBuilder(inStr.length());
    String subStr="";
    String interStr="";
    String maxStr="";
    int start=0,length=0, maxStart=0, maxlength=0, temp=0;

    while(start+2<inStr.length())   
    {    int i=0;
         temp=start;
         char x = inStr.charAt(start);
         char y = inStr.charAt(start+1);
         sb.append(x);
         sb.append(y);

         while( (x==y) && (start+2<inStr.length()) )
         {    start++;
              y = inStr.charAt(start+1);
              sb.append(y);
         }

         subStr=inStr.substring(start+2);
         while(i<subStr.length())
         {    if(subStr.charAt(i)==x || subStr.charAt(i)==y )
              {    sb.append(subStr.charAt(i));
                   i++;
              }
              else
                   break;
         }

         interStr= sb.toString();
         System.out.println("Intermediate string "+ interStr);
         length=interStr.length();
         if(maxlength<length)
         {    maxlength=length;
              length=0;
              maxStr = new String(interStr);
              maxStart=temp;
         }

         start++;
         sb.setLength(0);
   }

   System.out.println("");
   System.out.println("Longest string is "+maxStr.length()+" chars long "+maxStr);  
}
}

Answer

Aasmund Eldhuset picture Aasmund Eldhuset · Feb 21, 2013

Here's a hint that might guide you towards a linear-time algorithm (I assume that this is homework, so I won't give the entire solution): At the point where you have found a character that is neither equal to x nor to y, it is not necessary to go all the way back to start + 1 and restart the search. Let's take the string aabaaddaa. At the point where you have seen aabaa and the next character is d, there is no point in restarting the search at index 1 or 2, because in those cases, you'll only get abaa or baa before hitting d again. As a matter of fact, you can move start directly to index 3 (the first index of the last group of as), and since you already know that there is a contiguous sequene of as up to d, you can move i to index 5 and continue.

Edit: Pseudocode below.

// Find the first letter that is not equal to the first one, 
// or return the entire string if it consists of one type of characters
int start = 0;
int i = 1;
while (i < str.length() && str[i] == str[start])
    i++;
if (i == str.length())
    return str;

// The main algorithm
char[2] chars = {str[start], str[i]};
int lastGroupStart = 0;
while (i < str.length()) {
    if (str[i] == chars[0] || str[i] == chars[1]) {
        if (str[i] != str[i - 1])
            lastGroupStart = i;
    }
    else {
        //TODO: str.substring(start, i) is a locally maximal string; 
        //      compare it to the longest one so far
        start = lastGroupStart;
        lastGroupStart = i;
        chars[0] = str[start];
        chars[1] = str[lastGroupStart];
    }
    i++;
}
//TODO: After the loop, str.substring(start, str.length()) 
//      is also a potential solution.