R - time series hourly

pavemann picture pavemann · Jan 6, 2015 · Viewed 7.8k times · Source

I have the following dataset of incoming calls per day within the hours from 3 p.m. to 10 p.m. which looks like this:

Date        hour  Count  Year  Month  Day
01.01.2001  15    69     2001  1      1
01.01.2001  16    12     2001  1      1
01.01.2001  17    56     2001  1      1
01.01.2001  18    34     2001  1      1
01.01.2001  19    44     2001  1      1
01.01.2001  20    91     2001  1      1
01.01.2001  21    82     2001  1      1
01.01.2001  22    49     2001  1      1
...
17.08.2003  22    103    2003  8      17

what needs to be done is a time series analysis including forecasts, exponential smoothing, moving average and so forth.

the problem that I'm facing now is how to declare the ts function? I only have the peak hours from 3 p.m to 10 p.m available, so I can't declare the frequency as 24.

Can anybody help me out?

many thanks cheers,

Answer

G. Grothendieck picture G. Grothendieck · Jan 6, 2015

1) Assuming that the series starts at 3pm, that days are consecutive and all hours from 3pm to 10pm are present:

tser <- ts(DF[-1], freq = 8)

giving:

> tser
Time Series:
Start = c(1, 1) 
End = c(1, 8) 
Frequency = 8 
      hour Count Year Month Day
1.000   15    69 2001     1   1
1.125   16    12 2001     1   1
1.250   17    56 2001     1   1
1.375   18    34 2001     1   1
1.500   19    44 2001     1   1
1.625   20    91 2001     1   1
1.750   21    82 2001     1   1
1.875   22    49 2001     1   1

This will represent the index for day 1 3pm as 1.0, day 1 4pm as 1+1/8, day 1 5pm as 1+2/8, ..., day1 10pm as 1+7/8, day 2 3pm as 2, day 2 4pm as 2+1/8, etc.

2) This is the same but the days start at the number of days since 1970-01-01 instead of starting at 1:

tser <- ts(DF[-1], start = as.Date("2001-01-01"), freq = 8)

giving:

> tser
Time Series:
Start = c(11323, 1) 
End = c(11323, 8) 
Frequency = 8 
         hour Count Year Month Day
11323.00   15    69 2001     1   1
11323.12   16    12 2001     1   1
11323.25   17    56 2001     1   1
11323.38   18    34 2001     1   1
11323.50   19    44 2001     1   1
11323.62   20    91 2001     1   1
11323.75   21    82 2001     1   1
11323.88   22    49 2001     1   1

That is, this would represent each day as the number of days since 1970-01-01 plus, as before, 0, 1/8, ..., 7/8 for the hours.

If you later need to regenerate the date/time then:

library(chron)
tt <- as.numeric(time(tser))
as.chron(tt %/% 1) + (8 * tt%%1 + 15)/24

giving:

[1] (01/01/01 15:00:00) (01/01/01 16:00:00) (01/01/01 17:00:00)
[4] (01/01/01 18:00:00) (01/01/01 19:00:00) (01/01/01 20:00:00)
[7] (01/01/01 21:00:00) (01/01/01 22:00:00)

3) zoo If its not important to keep them equally spaced then you could try this:

library(zoo)
library(chron)
z <- zoo(DF[-1], as.chron(format(DF$Date), "%d.%m.%Y") + DF$hour/24)

giving:

> z
                    hour Count Year Month Day
(01/01/01 15:00:00)   15    69 2001     1   1
(01/01/01 16:00:00)   16    12 2001     1   1
(01/01/01 17:00:00)   17    56 2001     1   1
(01/01/01 18:00:00)   18    34 2001     1   1
(01/01/01 19:00:00)   19    44 2001     1   1
(01/01/01 20:00:00)   20    91 2001     1   1
(01/01/01 21:00:00)   21    82 2001     1   1
(01/01/01 22:00:00)   22    49 2001     1   1

The zoo approach does not require that all hours be present nor is it required that the days be consecutive.

Note: I am not sure that you really need all the date and hour fields broken out separately since they can easily be generated on the fly so this might be enough.

Count <- z$Count

Year can be recovered via as.numeric(format(time(Count), "%Y")) and month, day and hour can be recovered by using %m, %d or %H in place of %Y.

A list of the month, day and year columns can also be generated using month.day.year(time(Count)).

years(time(Count)), months(time(Count)), days(time(Count)) and hours(time(Count)) will produce factors of the indicated quantities.