I want to use sklearn
's StandardScaler
. Is it possible to apply it to some feature columns but not others?
For instance, say my data
is:
data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]})
Age Name Weight
0 18 3 68
1 92 4 59
2 98 6 49
col_names = ['Name', 'Age', 'Weight']
features = data[col_names]
I fit and transform the data
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
scaled_features = pd.DataFrame(features, columns = col_names)
Name Age Weight
0 -1.069045 -1.411004 1.202703
1 -0.267261 0.623041 0.042954
2 1.336306 0.787964 -1.245657
But of course the names are not really integers but strings and I don't want to standardize them. How can I apply the fit
and transform
methods only on the columns Age
and Weight
?
Currently the best way to handle this is to use ColumnTransformer as explained here.
First create a copy of your dataframe:
scaled_features = data.copy()
Don't include the Name column in the transformation:
col_names = ['Age', 'Weight']
features = scaled_features[col_names]
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)
Now, don't create a new dataframe but assign the result to those two columns:
scaled_features[col_names] = features
print(scaled_features)
Age Name Weight
0 -1.411004 3 1.202703
1 0.623041 4 0.042954
2 0.787964 6 -1.245657