Fitting a function in R

jnns picture jnns · Aug 7, 2012 · Viewed 34.4k times · Source

I have a few datapoints (x and y) that seem to have a logarithmic relationship.

> mydata
    x   y
1   0 123
2   2 116
3   4 113
4  15 100
5  48  87
6  75  84
7 122  77

> qplot(x, y, data=mydata, geom="line")

plot

Now I would like to find an underlying function that fits the graph and allows me to infer other datapoints (i.e. 3 or 82). I read about lm and nls but I'm not getting anywhere really.

At first, I created a function of which I thought it resembled the plot the most:

f <- function(x, a, b) {
    a * exp(b *-x)
}
x <- seq(0:100)
y <- f(seq(0:100), 1,1)
qplot(x,y, geom="line")

plot2

Afterwards, I tried to generate a fitting model using nls:

> fit <- nls(y ~ f(x, a, b), data=mydata, start=list(a=1, b=1))
   Error in numericDeriv(form[[3]], names(ind), env) :
   Missing value or an Infinity produced when evaluating the model

Can someone point me in the right direction on what to do from here?

Follow up

After reading your comments and googling around a bit further I adjusted the starting parameters for a, b and c and then suddenly the model converged.

fit <- nls(y~f(x,a,b,c), data=data.frame(mydata), start=list(a=1, b=30, c=-0.3))
x <- seq(0,120)
fitted.data <- data.frame(x=x, y=predict(fit, list(x=x))
ggplot(mydata, aes(x, y)) + geom_point(color="red", alpha=.5) + geom_line(alpha=.5) + geom_line(data=fitted.data)

plot3

Answer

Jilber Urbina picture Jilber Urbina · Aug 7, 2012

Maybe using a cubic specification for your model and estimating via lm would give you a good fit.

# Importing your data
dataset <- read.table(text='
    x   y
1   0 123
2   2 116
3   4 113
4  15 100
5  48  87
6  75  84
7 122  77', header=T)

# I think one possible specification would be a cubic linear model
y.hat <- predict(lm(y~x+I(x^2)+I(x^3), data=dataset)) # estimating the model and obtaining the fitted values from the model

qplot(x, y, data=dataset, geom="line") # your plot black lines
last_plot() + geom_line(aes(x=x, y=y.hat), col=2) # the fitted values red lines

# It fits good.

enter image description here