I've tried searching the other threads on this topic but none of the fixes are working for me. I have the results of a natural experiment and I want to show the number of consecutive occurrences of an event fit an exponential distribution. My R shell is pasted below
f <- function(x,a,b) {a * exp(b * x)}
> x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27
> y
[1] 1880 813 376 161 100 61 31 9 8 2 7 4 3 2 0
[16] 1 0 0 0 0 0 1 0 0 0 0 1
> dat2
x y
1 1 1880
2 2 813
3 3 376
4 4 161
5 5 100
6 6 61
7 7 31
8 8 9
9 9 8
10 10 2
11 11 7
12 12 4
13 13 3
14 14 2
> fm <- nls(y ~ f(x,a,b), data = dat2, start = c(a=1, b=1))
Error in numericDeriv(form[[3L]], names(ind), env) :
Missing value or an infinity produced when evaluating the model
> fm <- nls(y ~ f(x,a,b), data = dat2, start = c(a=7, b=-.5))
Error in nls(y ~ f(x, a, b), data = dat2, start = c(a = 7, b = -0.5)) :
singular gradient
> fm <- nls(y ~ f(x,a,b), data = dat2, start = c(a=7,b=-.5),control=nls.control(maxiter=1000,warnOnly=TRUE,minFactor=1e-5,tol=1e-10),trace=TRUE)
4355798 : 7.0 -0.5
Warning message:
In nls(y ~ f(x, a, b), data = dat2, start = c(a = 7, b = -0.5), :
singular gradient
Please forgive the bad formatting, first post here. x contains bins of a histogram, y contains the number of occurrences of each bin in that histograms. dat2 cuts off at 14 since the 0 count bins would throw off the exponential regression, and I really only need to fit those first 14. Those bins which have counts beyond 14 I have biological reason to believe they are special. The issue I initially got was infinity, which I don't get since none of the values are 0. After giving decent starting values as suggested by a different post here I get the singular gradient error. The only other posts I saw with that had more variables, I tried increasing the number of iterations but that did not succeed. Any help is appreciated. A
1) linearize to get starting values You need better starting values:
# starting values
fm0 <- nls(log(y) ~ log(f(x, a, b)), dat2, start = c(a = 1, b = 1))
nls(y ~ f(x, a, b), dat2, start = coef(fm0))
giving:
Nonlinear regression model
model: y ~ f(x, a, b)
data: x
a b
4214.4228 -0.8106
residual sum-of-squares: 2388
Number of iterations to convergence: 6
Achieved convergence tolerance: 3.363e-06
1a) Similarly we could use lm
to get the initial value by writing
y ~ a * exp(b * x)
as
y ~ exp(log(a) + b * x)
and taking logs of both to get a model linear in log(a) and b:
log(y) ~ log(a) + b * x
which can be solved using lm
:
fm_lm <- lm(log(y) ~ x, dat2)
st <- list(a = exp(coef(fm_lm)[1]), b = coef(fm_lm)[2])
nls(y ~ f(x, a, b), dat2, start = st)
giving:
Nonlinear regression model
model: y ~ f(x, a, b)
data: dat2
a b
4214.423 -0.811
residual sum-of-squares: 2388
Number of iterations to convergence: 6
Achieved convergence tolerance: 3.36e-06
1b) We can also get it to work by reparameterizing. In that case a = 1 and b = 1 will work provided we transform the initial values in line with the parameter transformation.
nls(y ~ exp(loga + b * x), dat2, start = list(loga = log(1), b = 1))
giving:
Nonlinear regression model
model: y ~ exp(loga + b * x)
data: dat2
loga b
8.346 -0.811
residual sum-of-squares: 2388
Number of iterations to convergence: 20
Achieved convergence tolerance: 3.82e-07
so b is as shown and a = exp(loga) = exp(8.346) = 4213.3
2) plinear Another possibility that is even easier is to use alg="plinear"
in which case starting values are not needed for the parameters entering linearly. In that case the starting value of b=1
in the question seems sufficient.
nls(y ~ exp(b * x), dat2, start = c(b = 1), alg = "plinear")
giving:
Nonlinear regression model
model: y ~ exp(b * x)
data: dat2
b .lin
-0.8106 4214.4234
residual sum-of-squares: 2388
Number of iterations to convergence: 11
Achieved convergence tolerance: 2.153e-06