Constrained Linear Regression in Python

ulmangt picture ulmangt · Apr 14, 2012 · Viewed 22.4k times · Source

I have a classic linear regression problem of the form:

y = X b

where y is a response vector X is a matrix of input variables and b is the vector of fit parameters I am searching for.

Python provides b = numpy.linalg.lstsq( X , y ) for solving problems of this form.

However, when I use this I tend to get either extremely large or extremely small values for the components of b.

I'd like to perform the same fit, but constrain the values of b between 0 and 255.

It looks like scipy.optimize.fmin_slsqp() is an option, but I found it extremely slow for the size of problem I'm interested in (X is something like 3375 by 1500 and hopefully even larger).

  1. Are there any other Python options for performing constrained least squares fits?
  2. Or are there python routines for performing Lasso Regression or Ridge Regression or some other regression method which penalizes large b coefficient values?

Answer

conradlee picture conradlee · May 30, 2012

You mention you would find Lasso Regression or Ridge Regression acceptable. These and many other constrained linear models are available in the scikit-learn package. Check out the section on generalized linear models.

Usually constraining the coefficients involves some kind of regularization parameter (C or alpha)---some of the models (the ones ending in CV) can use cross validation to automatically set these parameters. You can also further constrain models to use only positive coefficents---for example, there is an option for this on the Lasso model.