K-d trees: nearest neighbor search algorithm

algorithm nearest-neighbor kdtree

Kaiser Octavius · Nov 22, 2012 · Viewed 10.8k times · Source

This is my understanding of it: 1. Recurse down the tree, taking the left or right subtree according as whether ELEMENT would lie in the left or the right subtree, if it existed. 2. Set CURRENT_BEST as the first leaf node that you reach. 3. As you recurse back up, check to see whether ELEMENT lies closer to the splitting hyperplane than it does to CURRENT_BEST. If so, set CURRENT_BEST as the current node.

This is the part I got from Wikipedia and my class, and the part I don't understand: 4. Check to see whether any node in the other subtree of the splitting point singled out in 3. is closer to ELEMENT than the splitting point.

I don't see why we need to do 4., since any point that might lie in the one subtree of the splitting node must necessarily be closer to the splitting node than to any point in the other subtree.

It's obviously my understanding of the algorithm that is flawed, so help will be greatly appreciated.

Answer

Step 4 is the 'else' in step 3, what you do if the plane is closer than the point. Just because the point you found would be in the same rectangle as the point you are finding the neighbour for doesn't mean that it is the closest.

Imagine the following scenario: you have two points in your kD-Tree, A and B. A is in the middle of its rectangle, while B is just over the edge, in the partitioned area next to that of A. If you now search for the nearest neighbour to point C, which is right next to B but happens to be the other side of the edge and in the partition area of A, your first point you choose will be A due to the initial Depth First Search that chooses whatever would be in the same partition as your search point. However, B is actually closer, so even though you chose A, you need to check whether B is closer otherwise your kD-Tree won't actually give you correct results.

A good way of visualising this is to draw it out:

A-------------C--|--B

A is the first point we found in the DFS, C is our point we want the nearest neighbour of, B is the actual nearest neighbour, | is our split plane.

Another way to think of it is to draw a circle with radius dist(A,C) around point C. If any other rectangles have any portion of themselves fall within this circle, then there is a chance that they hold a point which might be closer to C than A is, so they must be checked. If you now find B, you can reduce the radius of your circle (because B is closer) so that less rectangles have a chance of intersecting, and once you have checked all the rectangles which intersect with your circle (reducing your circle radius as your find closer neighbours) you can definitively say that there are no closer points.

K-d trees: nearest neighbor search algorithm

Answer

Related questions