I have a dataframe with monthly financial data:
In [89]: vfiax_monthly.head()
Out[89]:
year month day d open close high low volume aclose
2003-01-31 2003 1 31 731246 64.95 64.95 64.95 64.95 0 64.95
2003-02-28 2003 2 28 731274 63.98 63.98 63.98 63.98 0 63.98
2003-03-31 2003 3 31 731305 64.59 64.59 64.59 64.59 0 64.59
2003-04-30 2003 4 30 731335 69.93 69.93 69.93 69.93 0 69.93
2003-05-30 2003 5 30 731365 73.61 73.61 73.61 73.61 0 73.61
I'm trying to calculate the returns like that:
In [90]: returns = (vfiax_monthly.open[1:] - vfiax_monthly.open[:-1])/vfiax_monthly.open[1:]
But I'm getting only zeroes:
In [91]: returns.head()
Out[91]:
2003-01-31 NaN
2003-02-28 0
2003-03-31 0
2003-04-30 0
2003-05-30 0
Freq: BM, Name: open
I think that's because the arithmetic operations get aligned on the index and that makes the [1:]
and [:-1]
useless.
My workaround is:
In [103]: returns = (vfiax_monthly.open[1:].values - vfiax_monthly.open[:-1].values)/vfiax_monthly.open[1:].values
In [104]: returns = pd.Series(returns, index=vfiax_monthly.index[1:])
In [105]: returns.head()
Out[105]:
2003-02-28 -0.015161
2003-03-31 0.009444
2003-04-30 0.076362
2003-05-30 0.049993
2003-06-30 0.012477
Freq: BM
Is there a better way to calculate the returns? I don't like the conversion to array and then back to Series.
Instead of slicing, use .shift
to move the index position of values in a DataFrame/Series. For example:
returns = (vfiax_monthly.open - vfiax_monthly.open.shift(1))/vfiax_monthly.open.shift(1)
This is what pct_change
is doing under the bonnet. You can also use it for other functions e.g.:
(3*vfiax_monthly.open + 2*vfiax_monthly.open.shift(1))/5
You might also want to looking into the rolling and window functions for other types of analysis of financial data.