average case running time of linear search algorithm

Brahadeesh picture Brahadeesh · Feb 26, 2011 · Viewed 6.9k times · Source

I am trying to derive the average case running time for deterministic linear search algorithm. The algorithm searches an element x in an unsorted array A in the order A[1], A[2], A[3]...A[n]. It stops when it finds the element x or proceeds until it reaches the end of the array. I searched on wikipedia and the answer given was (n+1)/(k+1) where k is the number of times x is present in the array. I approached in another way and am getting a different answer. Can anyone please give me the correct proof and also let me know whats wrong with my method?

E(T)= 1*P(1) + 2*P(2) + 3*P(3) ....+ n*P(n) where P(i) is the probability that 
                   the algorithm runs for 'i' time (i.e. compares 'i' elements).
P(i)= (n-i)C(k-1) * (n-k)! / n! 
Here, (n-i)C(k-1) is (n-i) Choose (k-1). As the algorithm has reached the ith 
step, the rest of k-1 x's must be in the last n-i elements. Hence (n-i)C(k-i).
(n-k)! is the total number of ways of arranging the rest non x numbers, and n! 
is the total number of ways of arranging the n elements in the array.

I am not getting (n+1)/(k+1) on simplifying.

Answer

user635541 picture user635541 · Feb 26, 2011

You've forgotten to account for the permutations of the k copies of x. The correct definition of P(i) is

P(i) = (n-i)C(k-1) * k! * (n-k)! / n! = (n-i)C(k-1) / nCk.
                     ^^

I'll turn things over to Mathematica:

In[1]:= FullSimplify[Sum[i Binomial[n-i, k-1]/Binomial[n, k], {i, 1, n}], 0 <= k <= n]

        1 + n
Out[1]= -----
        1 + k

To elaborate on my comment below: assume that all elements are distinct, let X be the set of matches, and let Y be the set of non-matches. By assumption, |X| = k and |Y| = n-k. The expected number of reads is equal to the sum over elements e of the probability that e is read.

Given a set of elements S, each element in S comes before all of the others with probability 1/|S|.

An element x in X is read if and only if it comes before every other element of X, which is probability 1/k. The total contribution of elements in X is |X| (1/k) = 1.

An element y in Y is read if and only if it comes before every element of X, which is probability 1/(k+1). The total contribution of elements in Y is |Y| (1/(k+1)) = (n-k)/(k+1).

We have 1 + (n-k)/(k+1) = (n+1)/(k+1).