I currently use the MATLAB version of the LIBSVM support vector machine to classify my data. The LIBSVM documentation mentions that scaling before applying SVM is very important and we have to use the same method to scale both training and testing data.
The "same method of scaling" is explained as:
For example, suppose that we scaled the first attribute of training data from [-10, +10]
to [-1, +1]
. If the first attribute of testing data lies in the range [-11, +8]
, we must scale the testing data to [-1.1, +0.8]
Scaling the training data in the range of [0,1]
can be done using the following MATLAB code :
(data - repmat(min(data,[],1),size(data,1),1))*spdiags(1./(max(data,[],1)-min(data,[],1))',0,size(data,2),size(data,2))
But I don't know how to scale the testing data correctly.
Thank you very much for your help.
The code you give is essentially subtracting the minimum and then dividing by the range. You need to store the minimum and range of the training data features.
minimums = min(data, [], 1);
ranges = max(data, [], 1) - minimums;
data = (data - repmat(minimums, size(data, 1), 1)) ./ repmat(ranges, size(data, 1), 1);
test_data = (test_data - repmat(minimums, size(test_data, 1), 1)) ./ repmat(ranges, size(test_data, 1), 1);