Linear regression slope error in numpy

Mixel picture Mixel · Dec 4, 2014 · Viewed 8.5k times · Source

I use numpy.polyfit to get a linear regression: coeffs = np.polyfit(x, y, 1).

What is the best way to calculate the error of the fit's slope using numpy?

Answer

jkalden picture jkalden · Dec 4, 2014

As already mentioned by @ebarr in the comments, you can use np.polyfit to return the residuals by using the keyword argument full=True.

Example:

x = np.array([0.0, 1.0, 2.0, 3.0,  4.0,  5.0])
y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
z, residuals, rank, singular_values, rcond = np.polyfit(x, y, 3, full=True)

residuals then is the sum of least squares.

Alternatively, you can use the keyword argument cov=True to get the covariance matrix.

Example:

x = np.array([0.0, 1.0, 2.0, 3.0,  4.0,  5.0])
y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
z, cov = np.polyfit(x, y, 3, cov=True)

Then, the diagonal elements of cov are the variances of the coefficients in z, i.e. np.sqrt(np.diag(cov)) gives you the standard deviations of the coefficients. You can use the standard deviations to estimate the probability that the absolute error exceeds a certain value, e.g. by inserting the standard deviations in the uncertainty propagation calculation. If you use e.g. 3*standard deviations in the uncertainty propagation, you calculate the error which will not be exceeded in 99.7% of the cases.

One last hint: you have to choose whether you choose full=True or cov=True. cov=True only works when full=False (default) or vice versa.