Optimize performance for calculation of euclidean distance between two images

wodzu picture wodzu · Oct 17, 2015 · Viewed 7.6k times · Source

I implemented the k-nearest-neighbours algorithm in python to classify some randomly picked images from the mnist database. However I found my distance function to be quite slow: An analisys of 10 test images against the training set of 10k images takes about 2mins. The images have a resolution of 28x28 pixels. Since I'm new to python I got the feeling this could be faster. The function is supposed to calculate the euclidean distance between two same-sized grayscale images.

def calculateDistance(image1, image2):
    distance = 0
    for i in range(len(image1)):
        for j in range(len(image1)):
            distance += math.pow((image1[i][j]-image2[i][j]),2)
    distance = numpy.sqrt(distance)
    return distance

Answer

Niklas B. picture Niklas B. · Oct 17, 2015

If you're using numpy arrays to represent the images, you could use the following instead:

def calculateDistance(i1, i2):
    return numpy.sum((i1-i2)**2)

This should be much faster because it uses a fast C implementation for the heavy lifting. Also consider using caching to not compute the difference of two images twice.