File format for classification using SVM light

ritesh picture ritesh · Aug 20, 2013 · Viewed 14.4k times · Source

I am trying to build a classifier using SVM light which classifies a document in one of the two classes. I have already trained and tested the classifier and a model file is saved to the disk. Now I want to use this model file to classify completely new documents. What should be the input file format for this? Could it be plain text file (I don't think that would work) or could be it just plain listing of features present in the text file without any class label and feature weights (in that case I have to keep track of the indices of features in feature vector during training) or is it some other format?

Answer

Marc Claesen picture Marc Claesen · Aug 20, 2013

Training and testing files must be of the same format, each instance results in a line of the following form:

<line> .=. <target> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>

For example (copy pasta from SVM^light website):

-1 1:0.43 3:0.12 9284:0.2 # abcdef

You can consult the SVM^light website for more information.