I want to apply the scaling sklearn.preprocessing.scale
module that scikit-learn
offers for centering a dataset that I will use to train an svm classifier.
How can I then store the standardization parameters so that I can also apply them to the data that I want to classify?
I know I can use the standarScaler
but can I somehow serialize it to a file so that I wont have to fit it to my data every time I want to run the classifier?
I think that the best way is to pickle it post fit
, as this is the most generic option. Perhaps you'll later create a pipeline composed of both a feature extractor and scaler. By pickling a (possibly compound) stage, you're making things more generic. The sklearn documentation on model persistence discusses how to do this.
Having said that, you can query sklearn.preprocessing.StandardScaler
for the fit parameters:
scale_ : ndarray, shape (n_features,) Per feature relative scaling of the data. New in version 0.17: scale_ is recommended instead of deprecated std_. mean_ : array of floats with shape [n_features] The mean value for each feature in the training set.
The following short snippet illustrates this:
from sklearn import preprocessing
import numpy as np
s = preprocessing.StandardScaler()
s.fit(np.array([[1., 2, 3, 4]]).T)
>>> s.mean_, s.scale_
(array([ 2.5]), array([ 1.11803399]))