What does dimensionality reduction mean?

Yasmeen picture Yasmeen · Jan 3, 2010 · Viewed 12.6k times · Source

What does dimensionality reduction mean exactly?

I searched for its meaning, I just found that it means the transformation of raw data into a more useful form. So what is the benefit of having data in useful form, I mean how can I use it in a practical life (application)?

Answer

Aditya Mukherji picture Aditya Mukherji · Jan 3, 2010

Dimensionality Reduction is about converting data of very high dimensionality into data of much lower dimensionality such that each of the lower dimensions convey much more information.

This is typically done while solving machine learning problems to get better features for a classification or regression task.

Heres a contrived example - Suppose you have a list of 100 movies and 1000 people and for each person, you know whether they like or dislike each of the 100 movies. So for each instance (which in this case means each person) you have a binary vector of length 100 [position i is 0 if that person dislikes the i'th movie, 1 otherwise ].
You can perform your machine learning task on these vectors directly.. but instead you could decide upon 5 genres of movies and using the data you already have, figure out whether the person likes or dislikes the entire genre and, in this way reduce your data from a vector of size 100 into a vector of size 5 [position i is 1 if the person likes genre i]

The vector of length 5 can be thought of as a good representative of the vector of length 100 because most people might be liking movies only in their preferred genres.

However its not going to be an exact representative because there might be cases where a person hates all movies of a genre except one.

The point is, that the reduced vector conveys most of the information in the larger one while consuming a lot less space and being faster to compute with.