It is easy to interpolate values in a Pandas.DataFrame
using Series.interpolate
, how can extrapolation be done?
For example, given a DataFrame as shown, how can we extrapolate it 14 more months to 2014-12-31? Linear extrapolation is fine.
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2}, index=pd.date_range('20130101',periods=10,freq='M'))
I am thinking that a new DataFrame must first be created, with the DateTimeIndex starting from 2013-11-31 and extending for 14 more M
periods. Beyond that I'm stuck.
DataFrame
with a DatetimeIndex
indexThis can be done with two steps:
DatetimeIndex
Overwrite df
with a new DataFrame
where the data is resampled onto a new extended index based on original index's start, period and frequency. This allows the original df
to come from anywhere, as in the csv
example case. With this the columns get conveniently filled with NaNs!
# Fake DataFrame for example (could come from anywhere)
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2}, index=pd.date_range('20130101',periods=10,freq='M'))
# Number of months to extend
extend = 5
# Extrapolate the index first based on original index
df = pd.DataFrame(
data=df,
index=pd.date_range(
start=df.index[0],
periods=len(df.index) + extend,
freq=df.index.freq
)
)
# Display
print df
x1 x2
2013-01-31 0 0
2013-02-28 1 1
2013-03-31 2 4
2013-04-30 3 9
2013-05-31 4 16
2013-06-30 5 25
2013-07-31 6 36
2013-08-31 7 49
2013-09-30 8 64
2013-10-31 9 81
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 NaN NaN
2014-02-28 NaN NaN
2014-03-31 NaN NaN
Most extrapolators will require the inputs to be numeric instead of dates. This can be done with
# Temporarily remove dates and make index numeric
di = df.index
df = df.reset_index().drop('index', 1)
See this answer for how to extrapolate the values of each column of a DataFrame
with a 3rd order polynomial.
Snippet from answer
# Curve fit each column for col in fit_df.columns: # Get x & y x = fit_df.index.astype(float).values y = fit_df[col].values # Curve fit column and get curve parameters params = curve_fit(func, x, y, guess) # Store optimized parameters col_params[col] = params[0] # Extrapolate each column for col in df.columns: # Get the index values for NaNs in the column x = df[pd.isnull(df[col])].index.astype(float).values # Extrapolate those points with the fitted function df[col][x] = func(x, *col_params[col])
Once the columns are extrapolated, put the dates back
# Put date index back
df.index = di
# Display
print df
x1 x2
2013-01-31 0 0
2013-02-28 1 1
2013-03-31 2 4
2013-04-30 3 9
2013-05-31 4 16
2013-06-30 5 25
2013-07-31 6 36
2013-08-31 7 49
2013-09-30 8 64
2013-10-31 9 81
2013-11-30 10 100
2013-12-31 11 121
2014-01-31 12 144
2014-02-28 13 169
2014-03-31 14 196