Minimum Euclidean distance between points in two different Numpy arrays, not within

fideli picture fideli · Dec 9, 2009 · Viewed 44.7k times · Source

I have two arrays of x-y coordinates, and I would like to find the minimum Euclidean distance between each point in one array with all the points in the other array. The arrays are not necessarily the same size. For example:

xy1=numpy.array(
[[  243,  3173],
[  525,  2997]])

xy2=numpy.array(
[[ 682, 2644],
[ 277, 2651],
[ 396, 2640]])

My current method loops through each coordinate xy in xy1 and calculates the distances between that coordinate and the other coordinates.

mindist=numpy.zeros(len(xy1))
minid=numpy.zeros(len(xy1))

for i,xy in enumerate(xy1):
    dists=numpy.sqrt(numpy.sum((xy-xy2)**2,axis=1))
    mindist[i],minid[i]=dists.min(),dists.argmin()

Is there a way to eliminate the for loop and somehow do element-by-element calculations between the two arrays? I envision generating a distance matrix for which I could find the minimum element in each row or column.

Another way to look at the problem. Say I concatenate xy1 (length m) and xy2 (length p) into xy (length n), and I store the lengths of the original arrays. Theoretically, I should then be able to generate a n x n distance matrix from those coordinates from which I can grab an m x p submatrix. Is there a way to efficiently generate this submatrix?

Answer

denis picture denis · Jun 27, 2010

(Months later) scipy.spatial.distance.cdist( X, Y ) gives all pairs of distances, for X and Y 2 dim, 3 dim ...
It also does 22 different norms, detailed here .

# cdist example: (nx,dim) (ny,dim) -> (nx,ny)

from __future__ import division
import sys
import numpy as np
from scipy.spatial.distance import cdist

#...............................................................................
dim = 10
nx = 1000
ny = 100
metric = "euclidean"
seed = 1

    # change these params in sh or ipython: run this.py dim=3 ...
for arg in sys.argv[1:]:
    exec( arg )
np.random.seed(seed)
np.set_printoptions( 2, threshold=100, edgeitems=10, suppress=True )

title = "%s  dim %d  nx %d  ny %d  metric %s" % (
        __file__, dim, nx, ny, metric )
print "\n", title

#...............................................................................
X = np.random.uniform( 0, 1, size=(nx,dim) )
Y = np.random.uniform( 0, 1, size=(ny,dim) )
dist = cdist( X, Y, metric=metric )  # -> (nx, ny) distances
#...............................................................................

print "scipy.spatial.distance.cdist: X %s Y %s -> %s" % (
        X.shape, Y.shape, dist.shape )
print "dist average %.3g +- %.2g" % (dist.mean(), dist.std())
print "check: dist[0,3] %.3g == cdist( [X[0]], [Y[3]] ) %.3g" % (
        dist[0,3], cdist( [X[0]], [Y[3]] ))


# (trivia: how do pairwise distances between uniform-random points in the unit cube
# depend on the metric ? With the right scaling, not much at all:
# L1 / dim      ~ .33 +- .2/sqrt dim
# L2 / sqrt dim ~ .4 +- .2/sqrt dim
# Lmax / 2      ~ .4 +- .2/sqrt dim