Extrapolate Pandas DataFrame

Nyxynyx picture Nyxynyx · Dec 8, 2015 · Viewed 8.7k times · Source

It is easy to interpolate values in a Pandas.DataFrame using Series.interpolate, how can extrapolation be done?

For example, given a DataFrame as shown, how can we extrapolate it 14 more months to 2014-12-31? Linear extrapolation is fine.

X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2},  index=pd.date_range('20130101',periods=10,freq='M'))

I am thinking that a new DataFrame must first be created, with the DateTimeIndex starting from 2013-11-31 and extending for 14 more M periods. Beyond that I'm stuck.

enter image description here

Answer

tmthydvnprt picture tmthydvnprt · Mar 12, 2016

Extrapolating a DataFrame with a DatetimeIndex index

This can be done with two steps:

  1. Extend the DatetimeIndex
  2. Extrapolate the data

Extend the Index

Overwrite df with a new DataFrame where the data is resampled onto a new extended index based on original index's start, period and frequency. This allows the original df to come from anywhere, as in the csv example case. With this the columns get conveniently filled with NaNs!

# Fake DataFrame for example (could come from anywhere)
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2},  index=pd.date_range('20130101',periods=10,freq='M'))

# Number of months to extend
extend = 5

# Extrapolate the index first based on original index
df = pd.DataFrame(
    data=df,
    index=pd.date_range(
        start=df.index[0],
        periods=len(df.index) + extend,
        freq=df.index.freq
    )
)

# Display
print df

    x1  x2
2013-01-31   0   0
2013-02-28   1   1
2013-03-31   2   4
2013-04-30   3   9
2013-05-31   4  16
2013-06-30   5  25
2013-07-31   6  36
2013-08-31   7  49
2013-09-30   8  64
2013-10-31   9  81
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 NaN NaN
2014-02-28 NaN NaN
2014-03-31 NaN NaN

Extrapolate the data

Most extrapolators will require the inputs to be numeric instead of dates. This can be done with

# Temporarily remove dates and make index numeric
di = df.index
df = df.reset_index().drop('index', 1)

See this answer for how to extrapolate the values of each column of a DataFrame with a 3rd order polynomial.

Snippet from answer

# Curve fit each column
for col in fit_df.columns:
    # Get x & y
    x = fit_df.index.astype(float).values
    y = fit_df[col].values
    # Curve fit column and get curve parameters
    params = curve_fit(func, x, y, guess)
    # Store optimized parameters
    col_params[col] = params[0]

# Extrapolate each column
for col in df.columns:
    # Get the index values for NaNs in the column
    x = df[pd.isnull(df[col])].index.astype(float).values
    # Extrapolate those points with the fitted function
    df[col][x] = func(x, *col_params[col])

Once the columns are extrapolated, put the dates back

# Put date index back
df.index = di

# Display
print df

x1   x2
2013-01-31   0    0
2013-02-28   1    1
2013-03-31   2    4
2013-04-30   3    9
2013-05-31   4   16
2013-06-30   5   25
2013-07-31   6   36
2013-08-31   7   49
2013-09-30   8   64
2013-10-31   9   81
2013-11-30  10  100
2013-12-31  11  121
2014-01-31  12  144
2014-02-28  13  169
2014-03-31  14  196