Deprecated rolling window option in OLS from Pandas to Statsmodels

Asher11 picture Asher11 · May 19, 2016 · Viewed 14.4k times · Source

as the title suggests, where has the rolling function option in the ols command in Pandas migrated to in statsmodels? I can't seem to find it. Pandas tells me doom is in the works:

FutureWarning: The pandas.stats.ols module is deprecated and will be removed in a future version. We refer to external packages like statsmodels, see some examples here: http://statsmodels.sourceforge.net/stable/regression.html
  model = pd.ols(y=series_1, x=mmmm, window=50)

in fact, if you do something like:

import statsmodels.api as sm

model = sm.OLS(series_1, mmmm, window=50).fit()

print(model.summary())

you get results (window does not impair the running of the code) but you get only the parameters of the regression run on the entire period, not the series of parameters for each of the rolling period it should be supposed to work on.

Answer

Brad Solomon picture Brad Solomon · Jun 11, 2017

I created an ols module designed to mimic pandas' deprecated MovingOLS; it is here.

It has three core classes:

  • OLS : static (single-window) ordinary least-squares regression. The output are NumPy arrays
  • RollingOLS : rolling (multi-window) ordinary least-squares regression. The output are higher-dimension NumPy arrays.
  • PandasRollingOLS : wraps the results of RollingOLS in pandas Series & DataFrames. Designed to mimic the look of the deprecated pandas module.

Note that the module is part of a package (which I'm currently in the process of uploading to PyPi) and it requires one inter-package import.

The first two classes above are implemented entirely in NumPy and primarily use matrix algebra. RollingOLS takes advantage of broadcasting extensively also. Attributes largely mimic statsmodels' OLS RegressionResultsWrapper.

An example:

import urllib.parse
import pandas as pd
from pyfinance.ols import PandasRollingOLS

# You can also do this with pandas-datareader; here's the hard way
url = "https://fred.stlouisfed.org/graph/fredgraph.csv"

syms = {
    "TWEXBMTH" : "usd", 
    "T10Y2YM" : "term_spread", 
    "GOLDAMGBD228NLBM" : "gold",
}

params = {
    "fq": "Monthly,Monthly,Monthly",
    "id": ",".join(syms.keys()),
    "cosd": "2000-01-01",
    "coed": "2019-02-01",
}

data = pd.read_csv(
    url + "?" + urllib.parse.urlencode(params, safe=","),
    na_values={"."},
    parse_dates=["DATE"],
    index_col=0
).pct_change().dropna().rename(columns=syms)
print(data.head())
#                  usd  term_spread      gold
# DATE                                       
# 2000-02-01  0.012580    -1.409091  0.057152
# 2000-03-01 -0.000113     2.000000 -0.047034
# 2000-04-01  0.005634     0.518519 -0.023520
# 2000-05-01  0.022017    -0.097561 -0.016675
# 2000-06-01 -0.010116     0.027027  0.036599

y = data.usd
x = data.drop('usd', axis=1)

window = 12  # months
model = PandasRollingOLS(y=y, x=x, window=window)

print(model.beta.head())  # Coefficients excluding the intercept
#             term_spread      gold
# DATE                             
# 2001-01-01     0.000033 -0.054261
# 2001-02-01     0.000277 -0.188556
# 2001-03-01     0.002432 -0.294865
# 2001-04-01     0.002796 -0.334880
# 2001-05-01     0.002448 -0.241902

print(model.fstat.head())
# DATE
# 2001-01-01    0.136991
# 2001-02-01    1.233794
# 2001-03-01    3.053000
# 2001-04-01    3.997486
# 2001-05-01    3.855118
# Name: fstat, dtype: float64

print(model.rsq.head())  # R-squared
# DATE
# 2001-01-01    0.029543
# 2001-02-01    0.215179
# 2001-03-01    0.404210
# 2001-04-01    0.470432
# 2001-05-01    0.461408
# Name: rsq, dtype: float64