So I have a data frame that has a date column, an hour column and a series of other numerical columns. Each row in the data frame is 1 hour of 1 day for an entire year.
The data frame looks like this:
Date Hour Melbourne Southern Flagstaff
1 2009-05-01 0 0 5 17
2 2009-05-01 2 0 2 1
3 2009-05-01 1 0 11 0
4 2009-05-01 3 0 3 8
5 2009-05-01 4 0 1 0
6 2009-05-01 5 0 49 79
7 2009-05-01 6 0 425 610
The hours are out of order because this is subsetted from another data frame.
I would like to sum the values in the numerical columns by month and possibly by day. Does anyone know how I can do this?
I create the data set by
data <- read.table( text=" Date Hour Melbourne Southern Flagstaff
1 2009-05-01 0 0 5 17
2 2009-05-01 2 0 2 1
3 2009-05-01 1 0 11 0
4 2009-05-01 3 0 3 8
5 2009-05-01 4 0 1 0
6 2009-05-01 5 0 49 79
7 2009-05-01 6 0 425 610",
header=TRUE,stringsAsFactors=FALSE)
You can do the summation with the function aggregate
:
byday <- aggregate(cbind(Melbourne,Southern,Flagstaff)~Date,
data=data,FUN=sum)
library(lubridate)
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month(Date),
data=data,FUN=sum)
Look at ?aggregate
to understand the function better. Starting with the last argument (because that makes explaining easier) the arguments do the following:
FUN
is the function that should be used for the aggregation. I use sum
to sum up the values, but i could also be mean
, max
or some function you wrote yourself.data
is used to indicate that data frame that I want to aggregate.~
, I indicate the variables I want to aggregate. If there is more than one, they are combined with cbind
. On the right hand side is the variable by which the data should be split. Putting Date
means that aggregate will sum up the variables for each distinct value of Date
.For the aggregation by month, I used the function month
from the package lubridate
. It does what one expects: it returns a numeric value indicating the month for a given date. Maybe you first need to install the package by install.packages("lubridate")
.
If you prefer not to use lubridate, you could do the following instead:
data <- transform(data,month=as.numeric(format(as.Date(Date),"%m")))
bymonth <- aggregate(cbind(Melbourne,Southern,Flagstaff)~month,
data=data,FUN=sum)
Here I added a new column to data that contains the month and then aggregated by that column.