I implemented the k-nearest-neighbours algorithm in python to classify some randomly picked images from the mnist database. However I found my distance function to be quite slow: An analisys of 10 test images against the training set of 10k images takes about 2mins. The images have a resolution of 28x28 pixels. Since I'm new to python I got the feeling this could be faster. The function is supposed to calculate the euclidean distance between two same-sized grayscale images.
def calculateDistance(image1, image2):
distance = 0
for i in range(len(image1)):
for j in range(len(image1)):
distance += math.pow((image1[i][j]-image2[i][j]),2)
distance = numpy.sqrt(distance)
return distance
If you're using numpy arrays to represent the images, you could use the following instead:
def calculateDistance(i1, i2):
return numpy.sum((i1-i2)**2)
This should be much faster because it uses a fast C implementation for the heavy lifting. Also consider using caching to not compute the difference of two images twice.