Right now i have a large data set with temperature going up and down all the time. I want to smoothen my data and plot the best fit line with all the temperature,
Here is the data:
weather.data
date mtemp
1 2008-01-01 12.9
2 2008-01-02 12.9
3 2008-01-03 14.5
4 2008-01-04 15.7
5 2008-01-05 17.0
6 2008-01-06 17.8
7 2008-01-07 20.2
8 2008-01-08 20.8
9 2008-01-09 21.4
10 2008-01-10 20.8
11 2008-01-11 21.4
12 2008-01-12 22.0
and so on............... til 2009 Dec 31
My current graph looks like this and my data fit a regression like either the running average or loess:
However, when I tried to fit it with the running average, it became like this:
Here is my code.
plot(weather.data$date,weather.data$mtemp,ylim=c(0,30),type='l',col="orange")
par(new=TRUE)
Could anyone give me a hand?
Depending on your actual data and how you want to smooth it, and why you want to smooth it there are various options.
I am showing you examples with linear regression (first and second order) and local regression (LOESS). These may or may not be the good statistical models to use for your data, but it is difficult to tell without seeing it. In any case:
time <- 0:100
temp <- 20+ 0.01 * time^2 + 0.8 * time + rnorm(101, 0, 5)
# Generate first order linear model
lin.mod <- lm(temp~time)
# Generate second order linear model
lin.mod2 <- lm(temp~I(time^2)+time)
# Calculate local regression
ls <- loess(temp~time)
# Predict the data (passing only the model runs the prediction
# on the data points used to generate the model itself)
pr.lm <- predict(lin.mod)
pr.lm2 <- predict(lin.mod2)
pr.loess <- predict(ls)
par(mfrow=c(2,2))
plot(time, temp, "l", las=1, xlab="Time", ylab="Temperature")
lines(pr.lm~time, col="blue", lwd=2)
plot(time, temp, "l", las=1, xlab="Time", ylab="Temperature")
lines(pr.lm2~time, col="green", lwd=2)
plot(time, temp, "l", las=1, xlab="Time", ylab="Temperature")
lines(pr.loess~time, col="red", lwd=2)
Another option would be to use a moving average.
For instance:
library(zoo)
mov.avg <- rollmean(temp, 5, fill=NA)
plot(time, temp, "l")
lines(time, mov.avg, col="orange", lwd=2)