I have a dataframe such as the following:
Index Return
2008-11-21 0.153419
2008-11-24 0.037421
2008-11-25 0.077500
What's the best way to calculate a cumulative return across all columns on the last row?
Following is the intended result:
Index Return
2008-11-21 0.153419
2008-11-24 0.037421
2008-11-25 0.077500
Cumulative 0.289316
Where cumulative return calculated as follows:
cumulative = (1 + return1) * (1 + return2) * (1 + return3) - 1
What is the best way to perform this in pandas?
there is a pandas cumprod()
method for that. this will work for every columns.
df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]
this would be about 2 time faster than other solutions on large dataset:
In[106]: %timeit df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]
10 loops, best of 3: 18.4 ms per loop
In[107]: %timeit df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)
10 loops, best of 3: 32.9 ms per loop
In[110]: %timeit df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)
10 loops, best of 3: 37.1 ms per loop
In[113]: %timeit df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)
1 loop, best of 3: 262 ms per loop
I would suggest to never use apply if you can find a built-in method since apply is looping over the dataframe which makes it slow. Bult-in method are highly efficient and normally there is no way you are going to get faster than them using apply.