I don't understand curve_fit
isn't able to estimate the covariance of the parameter, thus raising the OptimizeWarning
below. The following MCVE explains my problem:
MCVE python snippet
from scipy.optimize import curve_fit
func = lambda x, a: a * x
popt, pcov = curve_fit(f = func, xdata = [1], ydata = [1])
print(popt, pcov)
Output
\python-3.4.4\lib\site-packages\scipy\optimize\minpack.py:715:
OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
[ 1.] [[ inf]]
For a = 1
the function fits xdata
and ydata
exactly. Why isn't the error/variance 0
, or something close to 0
, but inf
instead?
There is this quote from the curve_fit
SciPy Reference Guide:
If the Jacobian matrix at the solution doesn’t have a full rank, then ‘lm’ method returns a matrix filled with np.inf, on the other hand ‘trf’ and ‘dogbox’ methods use Moore-Penrose pseudoinverse to compute the covariance matrix.
So, what's the underlying problem? Why doesn't the Jacobian matrix at the solution have a full rank?
The formula for the covariance of the parameters (Wikipedia) has the number of degrees of freedom in the denominator. The degrees of freedoms are computed as (number of data points) - (number of parameters), which is 1 - 1 = 0 in your example. And this is where SciPy checks the number of degrees of freedom before dividing by it.
With xdata = [1, 2], ydata = [1, 2]
you would get zero covariance (note that the model still fits exactly: exact fit is not the problem).
This is the same sort of issue as sample variance being undefined if the sample size N is 1 (the formula for sample variance has (N-1) in the denominator). If we only took size=1 sample out of the population, we don't estimate the variance by zero, we know nothing about the variance.