Consider the following example
library(tidyverse)
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = wday(time, label = TRUE))
Source: local data frame [25 x 3]
time values day_of_week
<date> <dbl> <fctr>
1 2014-02-24 30 Mon
2 2014-02-25 45 Tues
3 2014-02-26 30 Wed
4 2014-02-27 50 Thurs
5 2014-02-28 50 Fri
6 2014-03-01 20 Sat
7 2014-03-02 35 Sun
8 2014-03-03 50 Mon
9 2014-03-04 35 Tues
10 2014-03-05 35 Wed
I would like to aggregate this dataframe by week.
That is, suppose I define a week as starting on Monday morning and ending on Sunday evening, which we will call a Monday to Monday
cycle. (importantly, I want to be able to choose other conventions, such as Friday to Friday for instance).
Then, I would simply like to count the mean of values
for each week.
For instance, in the example above, one would compute the average of values
between Monday February 24th to Sunday March 2nd, and so on.
How can I do that?
Thanks!
EDIT: thanks to all of you who contributed an idea. Somewhat unusual, I think my late solution is probably more appropriate here. Thanks again!
In the tidyverse,
df2 %>% group_by(week = week(time)) %>% summarise(value = mean(values))
## # A tibble: 5 × 2
## week value
## <dbl> <dbl>
## 1 8 37.50000
## 2 9 38.57143
## 3 10 38.57143
## 4 11 36.42857
## 5 12 45.00000
Or use isoweek
instead:
df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <int> <dbl>
## 1 9 37.14286
## 2 10 40.71429
## 3 11 35.00000
## 4 12 42.50000
Or cut.Date
:
df2 %>% group_by(week = cut(time, "week")) %>% summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <fctr> <dbl>
## 1 2014-02-24 37.14286
## 2 2014-03-03 40.71429
## 3 2014-03-10 35.00000
## 4 2014-03-17 42.50000
which you can tell to start on Sunday, if you prefer:
df2 %>% group_by(week = cut(time, "week", start.on.monday = FALSE)) %>%
summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <fctr> <dbl>
## 1 2014-02-23 37.50000
## 2 2014-03-02 40.00000
## 3 2014-03-09 33.57143
## 4 2014-03-16 44.00000
If you want to shift to, say, Tuesday start, add one to your dates:
df2 %>% group_by(week = cut(time + 1, "week")) %>% summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <fctr> <dbl>
## 1 2014-02-24 37.50000
## 2 2014-03-03 40.00000
## 3 2014-03-10 33.57143
## 4 2014-03-17 44.00000
Labels will be off, though. If using cut
, consider the implications of its include.lowest
and right
parameters, documented at ?cut
.