How does the KD-tree nearest neighbor search work?

davea picture davea · Dec 11, 2010 · Viewed 13.7k times · Source

I am looking at the Wikipedia page for KD trees. As an example, I implemented, in python, the algorithm for building a kd tree listed.

The algorithm for doing KNN search with a KD tree, however, switches languages and isn't totally clear. The English explanation starts making sense, but parts of it (such as the area where they "unwind recursion" to check other leaf nodes) don't really make any sense to me.

How does this work, and how can one do a KNN search with a KD tree in python? This isn't meant to be a "send me the code!" type question, and I don't expect that. Just a brief explanation please :)

Answer

Steve Tjoa picture Steve Tjoa · Dec 12, 2010

This book introduction, page 3:

Given a set of n points in a d-dimensional space, the kd-tree is constructed recursively as follows. First, one finds a median of the values of the ith coordinates of the points (initially, i = 1). That is, a value M is computed, so that at least 50% of the points have their ith coordinate greater-or-equal to M, while at least 50% of the points have their ith coordinate smaller than or equal to M. The value of x is stored, and the set P is partitioned into PL and PR , where PL contains only the points with their ith coordinate smaller than or equal to M, and |PR | = |PL |±1. The process is then repeated recursively on both PL and PR , with i replaced by i + 1 (or 1, if i = d). When the set of points at a node has size 1, the recursion stops.

The following paragraphs discuss its use in solving nearest neighbor.

Or, here is the original 1975 paper by Jon Bentley.

EDIT: I should add that SciPy has a kdtree implementation: