K Nearest-Neighbor Algorithm

Gwaihir picture Gwaihir · Feb 3, 2011 · Viewed 8.1k times · Source

Maybe I'm rather stupid but I just can't find a satisfying answer: Using the KNN-algorithm, say k=5. Now I try to classify an unknown object by getting its 5 nearest neighbours. What to do, if after determining the 4 nearest neighbors, the next 2 (or more) nearest objects have the same distance? Which object of these 2 or more should be chosen as the 5th nearest neighbor?

Thanks in advance :)

Answer

Reed Copsey picture Reed Copsey · Feb 3, 2011

Which object of these 2 or more should be chosen as the 5th nearest neighbor?

It really depends on how you want to implement it.

Most algorithms will do one of three things:

  1. Include all equal distance points, so for this estimation, they'll use 6 points, not 5.
  2. Use the "first" found point of the two equal distant.
  3. Pick a random (usually with a consistent seed, so results are reproducable) point from the 2 points found.

That being said, most algorithms based on radial searching have an inherent assumption of stationarity, in which case, it really shouldn't matter which of the options above you choose. In general, any of them should, theoretically, provide reasonable defaults (especially since they're the furthest points in the approximation, and should have the lowest effective weightings).