In an extract from my textbook it says that reducing the value of K
when running this algorithm actually increases the complexity as it has to run more “smoothing”.
Can anyone explain this to me?
My understanding is that in 1NN
, you feed it your training set. You test on your testing set. Assume your testing set has one point in it. It finds the one point closest to it in the training set and returns the value of this.
Surely this is less complex than finding the 3 closest points in 3NN
, adding their values and dividing by three?
What have I misunderstood or overlooked?
I had the same moment of disbelief when reading that axiom ; a parameter of higher value that decreases complexity seems a bit counterintuitive at first.
To put an intuition on this, let's compare a 1-nearest-neighbour trained model, and a N>>1-nearest-neighbours one. Let's use a simplified 2D-plot (two-features dataset) with a binary classification (each "point" has a class, or label, of either A or B).
With the 1-nearest-neighbour model, each example of the training set is potentially the center of an area predicting class A or B, with most of its neighbors the center of an area predicting the other class. Your plot might look like one of those maps of ethnicity, language or religion in the regions of the world where they are deeply intertwined (Balkans or the Middle East comes to mind) : small patches of complex shapes and alternating colors, with no discernible logic, and thus "high complexity".
If you increase k, the areas predicting each class will be more "smoothed", since it's the majority of the k-nearest neighbours which decide the class of any point. Thus the areas will be of lesser number, larger sizes and probably simpler shapes, like the political maps of country borders in the same areas of the world. Thus "less complexity".
(Intuition and source from this course.)