Which algorithms to use for one class classification?

scikit-learn text-classification

Adam Wayland · Oct 23, 2013 · Viewed 10.2k times · Source

I have over 15000 text docs of a specific topic. I would like to build a language model based on the former so that I can present to this model new random text documents of various topics and the algorithms tells if the new doc is of the same topic.

I tried out sklearn.naive_bayes.MultinomialNB, sklearn.svm.classes.LinearSVC and others, however I have the following problem:

These algorithms require training data with more than one label or category and I only have web pages of covering a specific topic. The other docs are not labeled and of many different topics.

I would appreciate any guidance on how to train a model with only one label or how to proceed in general. What I have so far is:

c = MultinomialNB()
c.fit(X_train, y_train)
c.predict(X_test)

Thank you very much.

Answer

What you're looking for is the OneClassSvm. For more information you might want to check out the corresponding documentation at this link.

Which algorithms to use for one class classification?

Answer

Related questions