I use numpy.polyfit to get a linear regression: coeffs = np.polyfit(x, y, 1).
What is the best way to calculate the error of the fit's slope using numpy?
As already mentioned by @ebarr in the comments, you can use np.polyfit to return the residuals by using the keyword argument full=True
.
Example:
x = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0])
y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
z, residuals, rank, singular_values, rcond = np.polyfit(x, y, 3, full=True)
residuals
then is the sum of least squares.
Alternatively, you can use the keyword argument cov=True
to get the covariance matrix.
Example:
x = np.array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0])
y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
z, cov = np.polyfit(x, y, 3, cov=True)
Then, the diagonal elements of cov
are the variances of the coefficients in z, i.e. np.sqrt(np.diag(cov))
gives you the standard deviations of the coefficients. You can use the standard deviations to estimate the probability that the absolute error exceeds a certain value, e.g. by inserting the standard deviations in the uncertainty propagation calculation. If you use e.g. 3*standard deviations in the uncertainty propagation, you calculate the error which will not be exceeded in 99.7% of the cases.
One last hint: you have to choose whether you choose full=True
or cov=True
. cov=True
only works when full=False
(default) or vice versa.