How is it possible to do binary search on a doubly-linked list in O(n) time?

templatetypedef picture templatetypedef · Oct 24, 2013 · Viewed 10.4k times · Source

I've heard that it's possible to implement binary search over a doubly-linked list in O(n) time. Accessing a random element of a doubly-linked list takes O(n) time, and binary search accesses O(log n) different elements, so shouldn't the runtime be O(n log n) instead?

Answer

templatetypedef picture templatetypedef · Oct 24, 2013

It's technically correct to say that the runtime of binary search on a doubly-linked list is O(n log n), but that's not a tight upper bound. Using a slightly better implementation of binary search and a more clever analysis, it's possible to get binary search to run in time O(n).

The basic idea behind binary search is the following:

  • If the list is empty, the element being searched for doesn't exist.
  • Otherwise:
    • Look at the middle element.
    • If it matches the element in question, return it.
    • If it's bigger than the element in question, discard the back half of the list.
    • If it's smaller than the element in question, discard the front half of the list.

A naive implementation of binary search on a doubly-linked list would work by computing the indices to look up on each iteration (just like in the array case), then access each one by starting at the front of the list and scanning forward the appropriate number of steps. This is indeed very slow. If the element being searched for is at the very end of the array, the indices looked up would be n/2, 3n/4, 7n/8, etc. Summing up the work done in the worst case, we get

n / 2 + 3n/4 + 7n/8 + 15n/16 + ... (Θ(log n) terms)

≥ n / 2 + n / 2 + ... + n / 2 (Θ(log n) terms)

= Θ(n log n)

and

n / 2 + 3n/4 + 7n/8 + 15n/16 + ... (Θ(log n) terms)

≤ n + n + ... + n (Θ(log n) terms)

= Θ(n log n)

Therefore, the worst-case time complexity for this algorithm is Θ(n log n).

However, we can speed this up by a factor of Θ(log n) by being more clever with our approach. The reason the previous algorithm is slow is that every time we need to look up an element, we start the search from the beginning of the array. However, we don't need to do this. After looking up the middle element the first time, we're already in the middle of the array, and we know that the next lookup we're going to make will be either at position n / 4 or 3n / 4, which is only distance n / 4 from where we left off (compared to n / 4 or 3n / 4 if we start from the beginning of the array). What if we just trekked out from our stopping position (n / 2) to the next position, rather than restarting at the front of the list?

Here's our new algorithm. Begin by scanning to the middle of the array, which requires n / 2 steps. Then, determine whether to visit the element at the middle of the first half of the array or the middle of the second half of the array. Reaching there from position n / 2 only requires n / 4 total steps. From there, going to the midpoint of the quarter of the array containing the element takes only n / 8 steps, and going from there to the midpoint of the eighth of the array containing the element takes only n / 16 steps, etc. This means that the total number of steps made is given by

n / 2 + n / 4 + n / 8 + n / 16 + ...

= n (1/2 + 1/4 + 1/8 + ...)

≤ n

This follows from the fact that the sum of the infinite geometric series 1/2 + 1/4 + 1/8 + ... is 1. Therefore, the total work done in the worst case only Θ(n), which is much better than the Θ(n log n) worst case from before.

One last detail: why would you ever do this? After all, it already takes O(n) time to search a doubly-linked list for an element. One major advantage of this approach is that even though the runtime is O(n), we only end up doing O(log n) total comparisons (one per step of the binary search). This means that if comparisons are expensive, we might end up doing less work using a binary search than doing a normal linear search, since the O(n) comes from the work done walking the list rather than the work done making comparisons.