Efficient method for finding KNN of all nodes in a KD-Tree

St. John Johnson picture St. John Johnson · Mar 26, 2010 · Viewed 11k times · Source

I'm currently attempting to find K Nearest Neighbor of all nodes of a balanced KD-Tree (with K=2).

My implementation is a variation of the code from the Wikipedia article and it's decently fast to find KNN of any node O(log N).

The problem lies with the fact that I need to find KNN of each node. Coming up with about O(N log N) if I iterate over each node and perform the search.

Is there a more efficient way to do this?

Answer

Ryan Cox picture Ryan Cox · Mar 26, 2010

Depending on your needs, you may want to experiment with approximate techniques. For details, checkout Arya and Mount's work on the subject. A key paper is here. BigO complexity details are located in their '98 paper.

A graphical illustration of the work is shown below:

alt text

Source: http://www.cs.umd.edu/~mount/ANN/Images/annspeckle.gif

I have used their library on very high dimensional datasets with hundreds of thousands of elements. It's faster than anything else I found. The library handles both exact and approximate searches. The package contains some CLI utilities that you can use to easily experiment with your dataset; and even visualize the kd-tree ( see above ).

FWIW: I used the R Bindings.

From ANN's manual:

...it has been shown by Arya and Mount [AM93b] and Arya, et al. [AMN+98] that if the user is willing to tolerate a small amount of error in the search (returning a point that may not be the nearest neighbor, but is not significantly further away from the query point than the true nearest neighbor) then it is possible to achieve significant improvements in run- ning time. ANN is a system for answering nearest neighbor queries both exactly and approximately.