PCA and KNN algorithm

Test Test picture Test Test · Apr 17, 2012 · Viewed 8.9k times · Source

I am using KNN to classify handwritten digits. I also now have implemented PCA to reduce the dimensionality. From 256 I went to 200. But I only notice like, ~0.10% loss of information. I deleted 56 dimension. Shouldn't the loss be bigger? Only when I drop to 5 dimensions I get a ~20% loss. Is this normal?

Answer

B. Decoster picture B. Decoster · Apr 18, 2012

You're saying that after removing 56 dimensions, you lost nearly no information? Of course, that's the point of PCA! Principal Component Analysis, as the name states, help you determine which dimensions carry the information. And you can remove the rest, which makes the biggest part of it.

I you want some examples, in gene analysis, I have read papers where the dimension is reduced from 40'000 to 100 with PCA, then they do some magical stuff, and have an excellent classifier with 19 dimensions. This implicitely tells you that they lost virtually no information when they removed 39'900 dimensions!