Which algorithms to use for one class classification?

Adam Wayland picture Adam Wayland · Oct 23, 2013 · Viewed 10.2k times · Source

I have over 15000 text docs of a specific topic. I would like to build a language model based on the former so that I can present to this model new random text documents of various topics and the algorithms tells if the new doc is of the same topic.

I tried out sklearn.naive_bayes.MultinomialNB, sklearn.svm.classes.LinearSVC and others, however I have the following problem:

These algorithms require training data with more than one label or category and I only have web pages of covering a specific topic. The other docs are not labeled and of many different topics.

I would appreciate any guidance on how to train a model with only one label or how to proceed in general. What I have so far is:

c = MultinomialNB()
c.fit(X_train, y_train)
c.predict(X_test)

Thank you very much.

Answer

Matt picture Matt · Oct 24, 2013

What you're looking for is the OneClassSvm. For more information you might want to check out the corresponding documentation at this link.