I need to solve a computational problem that boils down to searching for reciprocally-nearest pairs of points between two sets. The problem goes something like this:
Given a set of points A and a set of points B in euclidean space, find all pairs (a,b) such that b is the closest point in B to a and a is the closest point in A to b.
The sets A and B are of approximately equal size, and we will call this size N. For my particular problem N is approximately 250,000.
The brute force solution is to compare every point against every other point, which has quadratic time complexity. Is there any more efficient algorithm for doing this?
A data structure I found very useful when I had to do nearest neighbour searches was a kd-tree. Wikipedia has a nice overview and this is an excellent in-depth discussion of the algorithm if you're implementing your own (although a library may well exist already - you don't mention which language you're using). The most important thing about a kd-tree is that it allows nearest-neghbour searches to be performed in O(log N) time.
In that way, you could produce two lists of - the members of A and their nearest neighbour in B and the members of B and their nearest neighbour in A - in O(N log N) time. Then, you could compare the lists to see which pairs match. Done naively, that's O(N^2), though you might be able to think of a way to do it faster.
[edit] You've got me thinking; here's my second thought:
for(a in A)
b := nearest(B, a)
if a = nearest(A, b)
add (a, b) to results
end if
end for
function nearest(X, y)
return nearest member of set X to point y
end function
By my reckoning, that's O(N log N).