Identifying points with the smallest Euclidean distance

Ηλίας picture Ηλίας · Feb 25, 2011 · Viewed 8.5k times · Source

I have a collection of n dimensional points and I want to find which 2 are the closest. The best I could come up for 2 dimensions is:

from numpy import *
myArr = array( [[1, 2],
                [3, 4],
                [5, 6],
                [7, 8]] )

n = myArr.shape[0]
cross = [[sum( ( myArr[i] - myArr[j] ) ** 2 ), i, j]
         for i in xrange( n )
         for j in xrange( n )
         if i != j
         ]

print min( cross )

which gives

[8, 0, 1]

But this is too slow for large arrays. What kind of optimisation can I apply to it?

RELATED:


Euclidean distance between points in two different Numpy arrays, not within

Answer

tkerwin picture tkerwin · Feb 25, 2011

Try scipy.spatial.distance.pdist(myArr). This will give you a condensed distance matrix. You can use argmin on it and find the index of the smallest value. This can be converted into the pair information.