Cannot understand with sklearn's PolynomialFeatures

TechieBoy101 picture TechieBoy101 · Aug 18, 2018 · Viewed 24.3k times · Source

Need help in sklearn's Polynomial Features. It works quite well with one feature but whenever I add multiple features, it also outputs some values in the array besides the values raised to the power of the degrees. For ex: For this array,

X=np.array([[230.1,37.8,69.2]])

when I try to

X_poly=poly.fit_transform(X)

It outputs

[[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+01
5.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+03
2.61576000e+03 4.78864000e+03]]

Here, what is 8.69778000e+03,1.59229200e+04,2.61576000e+03 ?

Answer

dim picture dim · Aug 18, 2018

If you have features [a, b, c] the default polynomial features(in sklearn the degree is 2) should be [1, a, b, c, a^2, b^2, c^2, ab, bc, ca].

2.61576000e+03 is 37.8x62.2=2615,76 (2615,76 = 2.61576000 x 10^3)

In a simple way with the PolynomialFeatures you can create new features. There is a good reference here. Of course there are and disadvantages("Overfitting") of using PolynomialFeatures(see here).

Edit:
We have to be careful when using the polynomial features. The formula for calculating the number of the polynomial features is N(n,d)=C(n+d,d) where n is the number of the features, d is the degree of the polynomial, C is binomial coefficient(combination). In our case the number is C(3+2,2)=5!/(5-2)!2!=10 but when the number of features or the degree is height the polynomial features becomes too many. For example:

N(100,2)=5151
N(100,5)=96560646

So in this case you may need to apply regularization to penalize some of the weights. It is quite possible that the algorithm will start to suffer from curse of dimensionality (here is also a very nice discussion).