Finding "maximum" overlapping interval pair in O(nlog(n))

user1751434 picture user1751434 · Sep 4, 2016 · Viewed 9.7k times · Source

Problem Statement

Input set of n intervals; {[s_1,t_1], [s_2,t_2], ... ,[s_n,t_n]}.

Output pair of intervals; {[s_i,t_i],[s_j,t_j]}, with the maximum overlap among all the interval pairs.

Example

input intervals : {[1, 10], [2, 6], [3,15], [5, 9]}

-> There are possible 6 interval pairs. Among those pairs, [1,10] & [3,15] has the largest possible overlap of 7.

output : {[1,10],[3,15]}

A naive algorithm will be a brute force method where all n intervals get compared to each other, while the current maximum overlap value being tracked. The time complexity would be O(n^2) for this case.

I was able to find many procedures regarding interval trees, maximum number of overlapping intervals and maximum set of non-overlapping intervals, but nothing on this problem. Maybe I would be able to use the ideas given in the above algorithms, but I wasn't able to come up with one.

I spent many hours trying to figure out a nice solution, but I think I need some help at this point.

Any suggestions will help!

Answer

ruakh picture ruakh · Sep 4, 2016

First, sort the intervals: first by left endpoint in increasing order, then — as a secondary criterion — by right endpoint in decreasing order. For the rest of this answer, I'll assume that the intervals are already in sorted order.

Now, there are two possibilities for what the maximum possible overlap might be:

  • it may be between an interval and a later interval that it completely covers.
  • it may be between an interval and the very next interval that it doesn't completely cover.

We can cover both cases in O(n) time by iterating over the intervals, keeping track of the following:

  • the greatest overlap we've seen so far, and the relevant pair of intervals.
  • the latest interval we've seen, call it L, that wasn't completely covered by any of its predecessors. (For this, the key insight is that thanks to the ordering of the intervals, we can easily tell if an interval is completely covered by any of its predecessors — and therefore if we need to update L — by simply checking if it's completely covered by the current L. So we can keep L up-to-date in O(1) time.)

and computing each interval's overlap with L.

So:

result := []
max_overlap := 0
L := sorted_intervals[1]
for interval I in sorted_intervals[2..n]:
    overlap := MIN(L.right, I.right) - I.left
    if overlap >= max_overlap:
        result := [L, I]
        max_overlap := overlap
    if I.right > L.right:
        L := I

So the total cost is the cost of sorting the intervals, which is likely to be O(n log n) time but may be O(n) if you can use bucket-sort or radix-sort or similar.