Is train/test-Split in unsupervised learning necessary/useful?

Christoph S picture Christoph S · Jul 28, 2015 · Viewed 8.7k times · Source

In supervised learning I have the typical train/test split to learn the algorithm, e.g. Regression or Classification. Regarding unsupervised learning, my question is: Is train/test split necessary and useful? If yes, why?

Answer

Mangesh Divate picture Mangesh Divate · Dec 6, 2017

Well This Depend on the Problem, the form of dataset and Class of Unsupervised algorithm used to solve the particular problem.

Roughly:- Dimensionality reduction techniques are usually tested by calculating the error in reconstruction so there we can use k-fold cross-validation procedure

But on clustering algorithm, I would suggest doing statistical testing in order to test performance. There is also little time-consuming trick which splitting dataset and hand label the test set with meaningfull classes and cross validate

In any case unsupervised algorithm is used on supervised data then it always good cross-validate

overall:- It is not necessary to split data in the train-test set but if we can do it it is always better

Here is article which explains how cross-validation is a good tool for unsupervised learning http://udini.proquest.com/view/cross-validation-for-unsupervised-pqid:1904931481/ and the full text is available here http://arxiv.org/pdf/0909.3052.pdf

https:///www.researchgate.net/post/Which_are_the_methods_to_validate_an_unsupervised_machine_learning_algorithm