Conditional Cumulative Sum in R

David picture David · May 24, 2013 · Viewed 14.3k times · Source

I have a time series data frame and want to compute cumulative returns for stock symbols intra-day for a range of dates. When the symbol and/or date changes the cumulative return should reset. Any help would be appreciated. A small sample of my data frame is below including what the cumulative sum column should return. Thanks.

       Date Symbol  Time   Last Return Cumulative.Sum
1  1/2/2013     AA  9:30  42.00    n/a            n/a
2  1/2/2013     AA 12:00  42.50  1.19%          1.19%
3  1/2/2013     AA 16:00  42.88  0.89%          2.08%
4  1/2/2013   AAPL  9:30 387.00    n/a            n/a
5  1/2/2013   AAPL 12:00 387.87  0.22%          0.22%
6  1/2/2013   AAPL 16:00 388.69  0.21%          0.44%
7  1/3/2013     AA  9:30  42.88    n/a            n/a
8  1/3/2013     AA 12:00  42.11 -1.80%         -1.80%
9  1/3/2013     AA 16:00  41.89 -0.52%         -2.32%

Answer

Justin picture Justin · May 24, 2013

using the data.table package this is trivial. If your data is in a data.frame called dat:

library(data.table)
DT <- data.table(dat)

DT[, your_cumsum_function(.SD), by=c('Date', 'Symbol')]

Where .SD is the subset of the data.table defined by the by groups. See ?data.table for more information.

You can also pass column names directly:

DT[, your_cumsum_function(Last), by=c('Date', 'Symbol')]

In your particular example, do:

DT[, Return := as.numeric(sub('%$', '', Return))]
DT[!is.na(Return), Cumulative.Sum := cumsum(Return), by = c('Date', 'Symbol')]