R: cumulative sum over rolling date range

Vlad picture Vlad · Sep 25, 2017 · Viewed 11.9k times · Source

In R, how can I calculate cumsum for a defined time period prior to the row being calculate? Prefer dplyr if possible.

For example, if the period was 10 days, then the function would achieve cum_rolling10:

date    value   cumsum  cum_rolling10
1/01/2000   9   9       9
2/01/2000   1   10      10
5/01/2000   9   19      19
6/01/2000   3   22      22
7/01/2000   4   26      26
8/01/2000   3   29      29
13/01/2000  10  39      29
14/01/2000  9   48      38
18/01/2000  2   50      21
19/01/2000  9   59      30
21/01/2000  8   67      38
25/01/2000  5   72      24
26/01/2000  1   73      25
30/01/2000  6   79      20
31/01/2000  6   85      18

Answer

www picture www · Sep 25, 2017

A solution using dplyr, tidyr, lubridate, and zoo.

library(dplyr)
library(tidyr)
library(lubridate)
library(zoo)

dt2 <- dt %>%
  mutate(date = dmy(date)) %>%
  mutate(cumsum = cumsum(value)) %>%
  complete(date = full_seq(date, period = 1), fill = list(value = 0)) %>%
  mutate(cum_rolling10 = rollapplyr(value, width = 10, FUN = sum, partial = TRUE)) %>%
  drop_na(cumsum)
dt2
# A tibble: 15 x 4
         date value cumsum cum_rolling10
       <date> <dbl>  <int>         <dbl>
 1 2000-01-01     9      9             9
 2 2000-01-02     1     10            10
 3 2000-01-05     9     19            19
 4 2000-01-06     3     22            22
 5 2000-01-07     4     26            26
 6 2000-01-08     3     29            29
 7 2000-01-13    10     39            29
 8 2000-01-14     9     48            38
 9 2000-01-18     2     50            21
10 2000-01-19     9     59            30
11 2000-01-21     8     67            38
12 2000-01-25     5     72            24
13 2000-01-26     1     73            25
14 2000-01-30     6     79            20
15 2000-01-31     6     85            18

DATA

dt <- structure(list(date = c("1/01/2000", "2/01/2000", "5/01/2000", 
"6/01/2000", "7/01/2000", "8/01/2000", "13/01/2000", "14/01/2000", 
"18/01/2000", "19/01/2000", "21/01/2000", "25/01/2000", "26/01/2000", 
"30/01/2000", "31/01/2000"), value = c(9L, 1L, 9L, 3L, 4L, 3L, 
10L, 9L, 2L, 9L, 8L, 5L, 1L, 6L, 6L)), .Names = c("date", "value"
), row.names = c(NA, -15L), class = "data.frame")