Algorithm for Shuffling a Linked List in n log n time

5StringRyan picture 5StringRyan · Aug 28, 2012 · Viewed 16.3k times · Source

I'm trying to shuffle a linked list using a divide-and-conquer algorithm that randomly shuffles a linked list in linearithmic (n log n) time and logarithmic (log n) extra space.

I'm aware that I can do a Knuth shuffle similar to that could be used in a simple array of values, but I'm not sure how I would do this with divide-and-conquer. What I mean is, what am I actually dividing? Do I just divide to each individual node in the list and then randomly assemble the list back together using some random value?

Or do I give each node a random number and then do a mergesort on the nodes based on the random numbers?

Answer

foxcub picture foxcub · Aug 29, 2012

What about the following? Perform the same procedure as merge sort. When merging, instead of selecting an element (one-by-one) from the two lists in sorted order, flip a coin. Choose whether to pick an element from the first or from the second list based on the result of the coin flip.

Algorithm.

shuffle(list):
    if list contains a single element
        return list

    list1,list2 = [],[]
    while list not empty:
        move front element from list to list1
        if list not empty: move front element from list to list2

    shuffle(list1)
    shuffle(list2)

    if length(list2) < length(list1):
        i = pick a number uniformly at random in [0..length(list2)]             
        insert a dummy node into list2 at location i 

    # merge
    while list1 and list2 are not empty:
        if coin flip is Heads:
            move front element from list1 to list
        else:
            move front element from list2 to list

    if list1 not empty: append list1 to list
    if list2 not empty: append list2 to list

    remove the dummy node from list

The key point for space is that splitting the list into two does not require any extra space. The only extra space we need is to maintain log n elements on the stack during recursion.

The point with the dummy node is to realize that inserting and removing a dummy element keeps the distribution of the elements uniform.

Analysis. Why is the distribution uniform? After the final merge, the probability P_i(n) of any given number ending up in the position i is as follows. Either it was:

  • in the i-th place in its own list, and the list won the coin toss the first i times, the probability of this is 1/2^i;
  • in the i-1-st place in its own list, and the list won the coin toss i-1 times including the last one and lost once, the probability of this is (i-1) choose 1 times 1/2^i;
  • in the i-2-nd place in its own list, and the list won the coin toss i-2 times including the last one and lost twice, the probability of this is (i-1) choose 2 times 1/2^i;
  • and so on.

So the probability

P_i(n) = \sum_{j=0}^{i-1} (i-1 choose j) * 1/2^i * P_j(n/2).

Inductively, you can show that P_i(n) = 1/n. I let you verify the base case and assume that P_j(n/2) = 2/n. The term \sum_{j=0}^{i-1} (i-1 choose j) is exactly the number of i-1-bit binary numbers, i.e. 2^{i-1}. So we get

P_i(n) = \sum_{j=0}^{i-1} (i-1 choose j) * 1/2^i * 2/n
       = 2/n * 1/2^i * \sum_{j=0}^{i-1} (i-1 choose j)
       = 1/n * 1/2^{i-1} * 2^{i-1}
       = 1/n

I hope this makes sense. The only assumption we need is that n is even, and that the two lists are shuffled uniformly. This is achieved by adding (and then removing) the dummy node.

P.S. My original intuition was nowhere near rigorous, but I list it just in case. Imagine we assign numbers between 1 and n at random to the elements of the list. And now we run a merge sort with respect to these numbers. At any given step of the merge, it needs to decide which of the heads of the two lists is smaller. But the probability of one being greater than the other should be exactly 1/2, so we can simulate this by flipping a coin.

P.P.S. Is there a way to embed LaTeX here?