How many ways can you insert a series of values into a BST to form a specific tree?

templatetypedef picture templatetypedef · Jun 15, 2013 · Viewed 10.9k times · Source

This earlier question asked how many ways there were to insert the values 1 - 7 into a binary search tree that would result in the following tree:

       4
     /   \
    2     6
   / \   / \
  1   3 5   7

(The answer is 80, by the way).

Suppose more generally that you're given an arbitrary BST holding some set of values and want to know how many possible ways there are to insert those values into a BST that would end up producing the resulting tree. Is there an efficient algorithm for determining this?

Thanks!

Answer

templatetypedef picture templatetypedef · Jun 15, 2013

We can solve this using a clever recursive algorithm.

As our base case, if you are given an empty tree, there is exactly one way to build that tree - don't insert any values.

For the recursive case, let's suppose that you have a BST that looks like this:

              v
             / \
            L   R

Here, v is the root, and L and R are the right subtrees, respectively.

If we want to build up this binary search tree, we would have to start off by inserting v first - if we didn't, v wouldn't be the root. After we insert v, we need to insert the elements in an ordering that will cause the subtrees L and R to be rebuilt. The tricky part here is that we don't need to build up all of L before building up R or vice-versa; we could insert some elements from L, then some elements from R, then more elements from L, then more elements from R, etc.

Fortunately, though, there is a useful observation we can make. Suppose that SL is a sequence of elements that, if inserted into a BST, forms the BST L. Similarly, let SR be a sequence of elements that if inserted into a BST form the BST R. If we consider any possible interleaving of the sequences SL and SR, we will end up with a sequence of elements that, if inserted into a BST containing just v, will build up the tree

   v
  / \
 L   R

As an example, consider this tree:

       4
     /   \
    2     6
   / \   / \
  1   3 5   7

One possible sequence that builds the left subtree (holding 1, 2, 3) is 2, 1, 3. One possible sequence that builds the right subtree is 6, 5, 7. Any possible interleaving of those sequences, when inserted into a BST containing just the root node 4, will end up building out the above BST. For example, any of these sequences will work:

 2, 1, 3, 6, 5, 7
 2, 6, 1, 5, 3, 7
 6, 5, 2, 1, 3, 7
 ...

Since given any sequences SL and SR that build up L and R we can interleave them in any order, we can write out a nice formula for the number of sequences that will build out a tree with v as its root, L as its left subtree, and R as its right subtree:

# ways = (# interleaves of SL and SR) × (# possible SLs) × (# possible SRs)

If you think about it, the last two terms in this product can be found by recursively finding the number of insertion sequences that work for the left and right subtrees. This means that if we can figure out how many possible interleavings there are for the two sequences, then we can determine how many possible insertion sequences will build up a given tree by evaluating the above expression recursively!

So how many interleavings are there? If we're given two sequences of length m and n with no duplicates in them, then we can come up with the number of interleavings of those sequences with the following observation. Consider the sequence

L L L ... L R R R ... R
  m times     n times

Any permutation of this sequence will give back a series of Ls and Rs where the number of Ls is equal to the number of elements in the sequence of length m and the number of Rs is equal to the number of elements in the sequence of length n. We can interpret this sequence as a way of describing how to build up the interleaving - any time we see L, we insert an element from the left sequence, and any time we see an R, we insert an element from the right sequence. For example, consider the sequences 4, 1, 0, 3 and 2, 7. Then the permutation LLRLRL gives the sequence

 4, 1, 2, 0, 3, 7
 L  L  R  L  R  L

If we start off with a sequence of m L's and n R's, the number of distinct permutations comes back as

(m + n)!
-------- = (m + n) choose m
 m!  n!

This holds because there are (m + n)! possible ways of reordering the Ls and the Rs if they were all distinct. Since they aren't, every ordering is counted m! n! too many times because we can permute the L's indistinguishably and the R's indistinguishably. Another way to think about this is to consider the set {1, 2, 3, ..., m + n} of indices in the interleaving, then to choose m of them to fill with elements from the first sequence, implicitly filling the rest of them with elements from the right sequence.

Okay... we now have a way of determining the number of ways of interleaving two sequences of length m and n. Therefore, we have the following:

# ways = (# interleaves of SL and SR) × (# possible SLs) × (# possible SRs)

= ((m + n) choose n) × (# possible SLs) × (# possible SRs)

Where m is the number of elements in the left subtree and n is the number of elements in the right subtree. Yay!

We can therefore write out pseudocode for this algorithm:

function countInsertionOrderings(Tree t) {
    if t is null, return 1;
    otherwise:
       let m = numElementsIn(t.left);
       let n = numElementsIn(t.right);
       return choose(m + n, n) * 
              countInsertionOrderings(t.left) *
              countInsertionOrderings(t.right);
}

This algorithm performs a total of O(n) multiplications, where n is the number of nodes in the tree, and visits every node exactly once (assuming the number of elements in the BST are cached at each node). This does not mean the algorithm runs in time O(n), though, because the work required to multiply these numbers together will grow rapidly as the numbers get larger and larger.

Hope this helps!