Detecting if a string has unique characters: comparing my solution to "Cracking the Coding Interview?"

Seephor picture Seephor · Oct 21, 2013 · Viewed 63.1k times · Source

I am working through the book "Cracking the Coding Interview" and I have come across questions here asking for answers, but I need help comparing my answer to the solution. My algorithm works, but I am having difficulty understand the solution in the book. Mainly because I don't understand what some of the operators are really doing.

The task is: "Implement an algorithm to determine if a string has all unique characters. What if you cannot use additional data structures?"

This is my solution:

public static boolean checkForUnique(String str){
    boolean containsUnique = false;

    for(char c : str.toCharArray()){
        if(str.indexOf(c) == str.lastIndexOf(c)){
            containsUnique = true;
        } else {
            containsUnique = false;
        }
    }

    return containsUnique;
}

It works, but how efficient is this? I saw that the complexity of the index functions for String in Java are O(n*m)

Here is the solution from the book:

public static boolean isUniqueChars(String str) {
    if (str.length() > 256) {
        return false;
    }
    int checker = 0;
    for (int i = 0; i < str.length(); i++) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

A couple things I am not quite understanding with the solution. First, what does the "|=" operator do? Why is 'a' subtracted from the current character in the string for the value of "val"? I know "<<" is a bitwise left shift, but what does (checker & (1<<val)) do? I know it is bitwise and, but I am not understanding it since I am not understanding the line where checker gets a value.

I am just not familiar with these operations and unfortunately the book does not give an explanation of the solutions, probably because it assumes you already understand these operations.

Answer

templatetypedef picture templatetypedef · Oct 21, 2013

There are two separate questions here: what's the efficiency of your solution, and what is the reference solution doing? Let's treat each independently.

First, your solution:

public static boolean checkForUnique(String str){
    boolean containsUnique = false;

    for(char c : str.toCharArray()){
        if(str.indexOf(c) == str.lastIndexOf(c)){
            containsUnique = true;
        } else {
            containsUnique = false;
        }
    }

    return containsUnique;
}

Your solution essentially consists of a loop over all characters in the string (let's say there are n of them), checking on each iteration whether the first and last index of the characters are the same. The indexOf and lastIndexOf methods each take time O(n), because they have to scan across all the characters of the string to determine if any of them match the one you're looking for. Therefore, since your loop runs O(n) times and does O(n) work per iteration, its runtime is O(n2).

However, there's something iffy about your code. Try running it on the string aab. Does it work correctly on this input? As a hint, as soon as you determine that there are two or more duplicated characters, you're guaranteed that there are duplicates and you can return that not all characters are unique.

Now, let's look at the reference:

public static boolean isUniqueChars(String str) {
    if (str.length() > 256) { // NOTE: Are you sure this isn't 26?
        return false;
    }
    int checker = 0;
    for (int i = 0; i < str.length(); i++) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

This solution is cute. The basic idea is the following: imagine that you have an array of 26 booleans, each one tracking whether a particular character has appeared in the string already. You start with all of them false. You then iterate across the characters of the string, and each time you see a character you look into the array slot for that character. If it's false, this is the first time you've seen the character and you can set the slot to true. If it's true, you've already seen this character and you can immediately report that there's a duplicate.

Notice that this method doesn't allocate an array of booleans. Instead, it opts for a clever trick. Since there are only 26 different characters possible and there are 32 bits in an int, the solution creates an int variable where each bit of the variable corresponds to one of the characters in the string. Instead of reading and writing an array, the solution reads and writes the bits of the number.

For example, look at this line:

if ((checker & (1 << val)) > 0) return false;

What does checker & (1 << val) do? Well, 1 << val creates an int value that has all bits zero except for the valth bit. It then uses bitwise AND to AND this value with checker. If the bit at position val in checker is already set, then this evaluates to a nonzero value (meaning we've already seen the number) and we can return false. Otherwise, it evaluates to 0, and we haven't seen the number.

The next line is this:

checker |= (1 << val);

This uses the "bitwise OR with assignment" operator, which is equivalent to

checker = checker | (1 << val);

This ORs checker with a value that has a 1 bit set only at position val, which turns the bit on. It's equivalent to setting the valth bit of the number to 1.

This approach is much faster than yours. First, since the function starts off by checking if the string has length greater than 26 (I'm assuming the 256 is a typo), the function never has to test any string of length 27 or greater. Therefore, the inner loop runs at most 26 times. Each iteration does O(1) work in bitwise operations, so the overall work done is O(1) (O(1) iterations times O(1) work per iteration), which is significantly faster than your implementation.

If you haven't seen bitwise operations used this way, I'd recommend searching for "bitwise operators" on Google to learn more.

Hope this helps!