How to find the kth smallest element in the union of two sorted arrays?

Michael picture Michael · Jan 5, 2011 · Viewed 81.2k times · Source

This is a homework question. They say it takes O(logN + logM) where N and M are the arrays lengths.

Let's name the arrays a and b. Obviously we can ignore all a[i] and b[i] where i > k.
First let's compare a[k/2] and b[k/2]. Let b[k/2] > a[k/2]. Therefore we can discard also all b[i], where i > k/2.

Now we have all a[i], where i < k and all b[i], where i < k/2 to find the answer.

What is the next step?

Answer

lambdapilgrim picture lambdapilgrim · Jan 20, 2012

I hope I am not answering your homework, as it has been over a year since this question was asked. Here is a tail recursive solution that will take log(len(a)+len(b)) time.

Assumption: The inputs are correct, i.e., k is in the range [0, len(a)+len(b)].

Base cases:

  • If length of one of the arrays is 0, the answer is kth element of the second array.

Reduction steps:

  • If mid index of a + mid index of b is less than k:
    • If mid element of a is greater than mid element of b, we can ignore the first half of b, adjust k.
    • Otherwise, ignore the first half of a, adjust k.
  • If k is less than sum of mid indices of a and b:
    • If mid element of a is greater than mid element of b, we can safely ignore second half of a.
    • Otherwise, we can ignore second half of b.

Code:

def kthlargest(arr1, arr2, k):
    if len(arr1) == 0:
        return arr2[k]
    elif len(arr2) == 0:
        return arr1[k]

    mida1 = len(arr1) // 2  # integer division
    mida2 = len(arr2) // 2
    if mida1 + mida2 < k:
        if arr1[mida1] > arr2[mida2]:
            return kthlargest(arr1, arr2[mida2+1:], k - mida2 - 1)
        else:
            return kthlargest(arr1[mida1+1:], arr2, k - mida1 - 1)
    else:
        if arr1[mida1] > arr2[mida2]:
            return kthlargest(arr1[:mida1], arr2, k)
        else:
            return kthlargest(arr1, arr2[:mida2], k)

Please note that my solution is creating new copies of smaller arrays in every call, this can be easily eliminated by only passing start and end indices on the original arrays.