Need help in sklearn's Polynomial Features. It works quite well with one feature but whenever I add multiple features, it also outputs some values in the array besides the values raised to the power of the degrees. For ex: For this array,
X=np.array([[230.1,37.8,69.2]])
when I try to
X_poly=poly.fit_transform(X)
It outputs
[[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+01
5.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+03
2.61576000e+03 4.78864000e+03]]
Here, what is 8.69778000e+03,1.59229200e+04,2.61576000e+03
?
If you have features [a, b, c]
the default polynomial features(in sklearn
the degree is 2) should be [1, a, b, c, a^2, b^2, c^2, ab, bc, ca]
.
2.61576000e+03
is 37.8x62.2=2615,76
(2615,76 = 2.61576000 x 10^3
)
In a simple way with the PolynomialFeatures
you can create new features. There is a good reference here. Of course there are and disadvantages("Overfitting") of using PolynomialFeatures
(see here).
Edit:
We have to be careful when using the polynomial features. The formula for calculating the number of the polynomial features is N(n,d)=C(n+d,d)
where n
is the number of the features, d
is the degree of the polynomial, C
is binomial coefficient(combination). In our case the number is C(3+2,2)=5!/(5-2)!2!=10
but when the number of features or the degree is height the polynomial features becomes too many. For example:
N(100,2)=5151
N(100,5)=96560646
So in this case you may need to apply regularization to penalize some of the weights. It is quite possible that the algorithm will start to suffer from curse of dimensionality (here is also a very nice discussion).