I am doing analysis on hourly precipitation on a file that is disorganized. However, I managed to clean it up and store it in a dataframe (called CA1) which takes the form as followed:
Station_ID Guage_Type Lat Long Date Time_Zone Time_Frame H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23
1 4457700 HI 41.52 124.03 1948-07-01 8 LST 0 0 0 0 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0 0 0 0 0 0 0 0 0 0
2 4457700 HI 41.52 124.03 1948-07-05 8 LST 0 1 1 1 1 1 2.0000000 2.0000000 2.0000000 4.0000000 5.0000000 5.0000000 4 7 1 1 0 0 10 13 5 1 1 3
3 4457700 HI 41.52 124.03 1948-07-06 8 LST 1 1 1 0 1 1 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0 0 0 0 0 0 0 0 0 0
4 4457700 HI 41.52 124.03 1948-07-27 8 LST 3 0 0 0 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0 0 0 0 0 0 0 0 0 0
5 4457700 HI 41.52 124.03 1948-08-01 8 LST 0 0 0 0 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0 0 0 0 0 0 0 0 0 0
6 4457700 HI 41.52 124.03 1948-08-17 8 LST 0 0 0 0 0 0 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889 0.3888889 6 1 0 0 0 0 0 0 0 0 0 0
Where H0 through H23 represent the 24 hours per day (row)
Using only CA1 (the dataframe above), I take each day (row) of 24 points and transpose it vertically and concatenate the remaining days (rows) to one variable, which I call dat1:
> dat1[1:48,]
H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 H0 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 4 5 5 4 7 1 1 0 0 10 13 5 1 1 3
Using the variable dat1, I input it as an argument to get a time series data:
> rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon),
frequency = 24)
A few things to note:
>dim(CA1)
[1] 5636 31
>length(dat1)
[1] 135264
Thus 5636*24 (total data points [24] per row) = 135264 total points. The length(rainCA1) agrees with the points above. However, if I put an end in the ts function, such as
>rainCA1 <- ts(dat1, start = c(1900+as.POSIXlt(CA1[1,5])$year, 1+as.POSIXlt(CA1[1,5])$mon),
end = c(1900+as.POSIXlt(CA1[5636,5])$year, 1+as.POSIXlt(CA1[5636,5])$mon),
frequency = 24)
I get 1134 total length of points, where I am missing a lot of data. I am assuming this is due to the dates not being consecutive and since I am only apply the month and year as argument for the starting point.
Continuing, in what I think is the correct path, using the first ts calculation without the end argument, I supply it as an input for stl:
>rainCA1_2 <-stl(rainCA1, "periodic")
Unfortunately, I get an error:
Error in stl(rainCA1, "periodic") : only univariate series are allowed
Which I don't understand or how to go about it. However, if I return to the ts function and provide the end argument, stl works fine without any errors.
I have researched in a lot of forums, but no one (or to my understanding) provides a well solution to obtain the data attributes of hourly data. If anyone could help me, I will highly appreciate it. Thank you!
That error is a result of the shape of your data. Try > dim(rainCA1)
; I suspect it to give something like > [1] 135264 1
.
Replace rainCA1 <- ts(dat1 ...
by rainCA1 <- ts(dat1[[1]] ...
, and it should work.
Whether it does so correctly, I wonder...
It seems to me your first order of business is to get your data of a consistent format. Make sure ts()
gets the right input. Check out the precise specification of ts
.
ts()
does not interpret date-time formats. ts()
requires consecutive data points with a fixed interval. It uses a major counter and a minor counter (of which frequency
fit into one major counter). For instance, if your data is hourly and you expect seasonality on the daily level, frequency
equals 24. start
and end
, therefore, are primarily cosmetic: start
merely indicates t(0) for the major counter, whereas end
signifies t(end).