What is target in Python's sklearn coef_ output?

django_noob picture django_noob · Feb 13, 2016 · Viewed 13.3k times · Source

When I do ridge regression using sklearn in Python, the coef_ output gives me a 2D array. According to the documentation it is (n_targets, n_features).

I understand that features are my coefficients. However, I am not sure what targets are. What is this?

Answer

Martin Pilát picture Martin Pilát · Feb 14, 2016

The targets are the values you want to predict. The ridge regression can in fact predict more values for each instance, not only one. The coef_ contain the coefficients for the prediction of each of the targets. It is also the same as if you trained a model to predict each of the targets separately.

Let's have a look at a simple example. I will use LinearRegression instead of Ridge, as Ridge shrinks the values of the coefficients and make it harder to understand.

First, we create some random data:

X = np.random.uniform(size=100).reshape(50, 2)
y = np.dot(X, [[1, 2, 3], [3, 4, 5]])

The first three instances in X are:

[[ 0.70335619  0.42612165]
 [ 0.2959883   0.10571314]
 [ 0.33868804  0.07351525]]

The targets y for these instances are

[[ 1.98172114  3.11119897  4.24067681]
 [ 0.61312771  1.01482915  1.41653058]
 [ 0.55923378  0.97143708  1.38364037]]

Notice, that y[0] = x[0]+3*x[1], y[1] = 2*x[0] + 4*x[1] and y[2] = 3*x[0] + 5*x[1] (that's how we created the data with the matrix multiplication).

If we now fit the linear regression model

clf = linear_model.LinearRegression()
clf.fit(X, y) 

the coef_s are:

[[ 1.  3.]
 [ 2.  4.]
 [ 3.  5.]]

This exactly matches the equations we used to create the data.