Implementing an iterator over a binary search tree

templatetypedef picture templatetypedef · Jan 3, 2011 · Viewed 55.3k times · Source

I've been coding up a bunch of different binary search tree implementations recently (AVL, splay, treap) and am curious if there's a particularly "good" way to write an iterator to traverse these structures. The solution I've used right now is to have each node in the BST store pointers to the next and previous elements in the tree, which reduces iteration to a standard linked-list iteration. However, I'm not really satisfied with this answer. It increases the space usage of each node by two pointers (next and previous), and in some sense it's just cheating.

I know of a way of building a binary search tree iterator that uses O(h) auxiliary storage space (where h is the height of the tree) by using a stack to keep track of the frontier nodes to explore later on, but I've resisted coding this up because of the memory usage. I was hoping there is some way to build an iterator that uses only constant space.

My question is this - is there a way to design an iterator over a binary search tree with the following properties?

  1. Elements are visited in ascending order (i.e. an inorder traversal)
  2. next() and hasNext() queries run in O(1) time.
  3. Memory usage is O(1)

To make it easier, it's fine if you assume that the tree structure isn't changing shape during the iteration (i.e. no insertions, deletions, or rotations), but it would be really cool if there was a solution that could indeed handle this.

Answer

SingleNegationElimination picture SingleNegationElimination · Jan 3, 2011

The simplest possible iterator stores the last seen key, and then on the next iteration, searches the tree for the least upper bound for that key. Iteration is O(log n). This has the advantage of being very simple. If keys are small then the iterators are also small. of course it has the disadvantage of being a relatively slow way of iterating through the tree. It also won't work for non-unique sequences.

Some trees use exactly the implementation you already use, because it's important for their specific use that scanning is very fast. If the number of keys in each node is large, then the penalty of storing sibling pointers isn't too onerous. Most B-Trees use this method.

many search tree implementations keep a parent pointer on each node to simplify other operations. If you have that, then you can use a simple pointer to the last seen node as your iterator's state. at each iteration, you look for the next child in the last seen node's parent. if there are no more siblings, then you go up one more level.

If none of these techniques suit you, you can use a stack of nodes, stored in the iterator. This serves a the same function as the function call stack when iterating through the search tree as normal, but instead of looping through siblings and recursing on children, you push children onto the stack and return each successive sibling.