What is the difference between supervised learning and unsupervised learning?

TIMEX picture TIMEX · Dec 2, 2009 · Viewed 170k times · Source

In terms of artificial intelligence and machine learning, what is the difference between supervised and unsupervised learning? Can you provide a basic, easy explanation with an example?

Answer

Davide picture Davide · Dec 6, 2009

Since you ask this very basic question, it looks like it's worth specifying what Machine Learning itself is.

Machine Learning is a class of algorithms which is data-driven, i.e. unlike "normal" algorithms it is the data that "tells" what the "good answer" is. Example: a hypothetical non-machine learning algorithm for face detection in images would try to define what a face is (round skin-like-colored disk, with dark area where you expect the eyes etc). A machine learning algorithm would not have such coded definition, but would "learn-by-examples": you'll show several images of faces and not-faces and a good algorithm will eventually learn and be able to predict whether or not an unseen image is a face.

This particular example of face detection is supervised, which means that your examples must be labeled, or explicitly say which ones are faces and which ones aren't.

In an unsupervised algorithm your examples are not labeled, i.e. you don't say anything. Of course, in such a case the algorithm itself cannot "invent" what a face is, but it can try to cluster the data into different groups, e.g. it can distinguish that faces are very different from landscapes, which are very different from horses.

Since another answer mentions it (though, in an incorrect way): there are "intermediate" forms of supervision, i.e. semi-supervised and active learning. Technically, these are supervised methods in which there is some "smart" way to avoid a large number of labeled examples. In active learning, the algorithm itself decides which thing you should label (e.g. it can be pretty sure about a landscape and a horse, but it might ask you to confirm if a gorilla is indeed the picture of a face). In semi-supervised learning, there are two different algorithms which start with the labeled examples, and then "tell" each other the way they think about some large number of unlabeled data. From this "discussion" they learn.