Determine if two binary trees are equal

Rachel picture Rachel · Sep 27, 2009 · Viewed 37.8k times · Source

What would be the efficient algorithm to find if two given binary trees are equal - in structure and content?

Answer

Steve314 picture Steve314 · Sep 28, 2009

It's a minor issue, but I'd adapt the earlier solution as follows...

eq(t1, t2) =
  t1.data=t2.data && eq(t1.left, t2.left) && eq(t1.right, t2.right)

The reason is that mismatches are likely to be common, and it is better to detect (and stop comparing) early - before recursing further. Of course, I'm assuming a short-circuit && operator here.

I'll also point out that this is glossing over some issues with handling structurally different trees correctly, and with ending the recursion. Basically, there need to be some null checks for t1.left etc. If one tree has a null .left but the other doesn't, you have found a structural difference. If both have null .left, there's no difference, but you have reached a leaf - don't recurse further. Only if both .left values are non-null do you recurse to check the subtree. The same applies, of course, for .right.

You could include checks for e.g. (t1.left == t2.left), but this only makes sense if subtrees can be physically shared (same data structure nodes) for the two trees. This check would be another way to avoid recursing where it is unnecessary - if t1.left and t2.left are the same physical node, you already know that those whole subtrees are identical.

A C implementation might be...

bool tree_compare (const node* t1, const node* t2)
{
  // Same node check - also handles both NULL case
  if (t1 == t2)  return true;

  // Gone past leaf on one side check
  if ((t1 == NULL) || (t2 == NULL))  return false;

  // Do data checks and recursion of tree
  return ((t1->data == t2->data) && tree_compare (t1->left,  t2->left )
                                 && tree_compare (t1->right, t2->right));
}

EDIT In response to a comment...

The running time for a full tree comparison using this is most simply stated as O(n) where n is kinda the size of a tree. If you're willing to accept a more complex bound you can get a smaller one such as O(minimum(n1, n2)) where n1 and n2 are the sizes of the trees.

The explanation is basically that the recursive call is only made (at most) once for each node in the left tree, and only made (at most) once for each node in the right tree. As the function itself (excluding recursions) only specifies at most a constant amount of work (there are no loops), the work including all recursive calls can only be as much as the size of the smaller tree times that constant.

You could analyse further to get a more complex but smaller bound using the idea of the intersection of the trees, but big O just gives an upper bound - not necessarily the lowest possible upper bound. It's probably not worthwhile doing that analysis unless you're trying to build a bigger algorithm/data structure with this as a component, and as a result you know that some property will always apply to those trees which may allow you a tighter bound for the larger algorithm.

One way to form a tigher bound is to consider the sets of paths to nodes in both trees. Each step is either an L (left subtree) or an R (right subtree). So the root is specified with an empty path. The right child of the left child of the root is "LR". Define a function "paths (T)" (mathematically - not part of the program) to represent the set of valid paths into a tree - one path for every node.

So we might have...

paths(t1) = { "", "L", "LR", "R", "RL" }
paths(t2) = { "", "L", "LL", "R", "RR" }

The same path specifications apply to both trees. And each recursion always follows the same left/right link for both trees. So the recursion visits the paths in the itersection of these sets, and the tightest bound we can specify using this is the cardinality of that intersection (still with the constant bound on work per recursive call).

For the tree structures above, we do recursions for the following paths...

paths(t1) intersection paths(t2) = { "", "L", "R" }

So our work in this case is bounded to at most three times the maximum cost of non-recursive work in the tree_compare function.

This is normally an unnecessary amount of detail, but clearly the intersection of the path-sets is at most as large as the number of nodes in the smallest original tree. And whether the n in O(n) refers to the number of nodes in one original tree or to the sum of the nodes in both, this is clearly no smaller than either the minimum or our intersection. Therefore O(n) isn't such a tight bound, but it's still a valid upper bound, even if we're a bit vague which size we're talking about.