Find Closest Vector from a List of Vectors | Python

ajl123 picture ajl123 · Sep 8, 2015 · Viewed 13.6k times · Source

If you are given say a list of 10 vectors, called A that represent different groups. Then you have a time series of vectors v1,v2,...,vn, each being a vector as well. I was wondering if there was a way to find the "closest" vector in A for each v1,v2,...,vn if you define some distance metric?

Is there a quick way to do this besides for looping through and just comparing all entries?

Edit: No I am not asking how to do k-means or something like that.

Answer

haraldkl picture haraldkl · Sep 8, 2015

You can use the spatial KDtree in scipy. It uses a fast tree algorithm to identify close by points for vectors of arbitrary dimension.

Edit: sorry, if you are looking for arbitrary distance metrics, a Tree like structure might still be an option.

Here is an example:

>>> from scipy import spatial
>>> A = [[0,1,2,3,4], [4,3,2,1,0], [2,5,3,7,1], [1,0,1,0,1]]
>>> tree = spatial.KDTree(A)

This sets up the KDTree with all the points in A, allowing you to perform fast spatial searches within it. Such a query takes a vector and returns the closest neighbor in A for it:

>>> tree.query([0.5,0.5,0.5,0.5,0.5])
(1.1180339887498949, 3)

The first return value is the distance of the closest neighbor and the second its position in A, such that you can obtain it for example like this:

>>> A[ tree.query([0.5,0.5,0.5,0.5,0.5])[1] ]
[1, 0, 1, 0, 1]