Missing values in Time Series in python

Marco Miglionico picture Marco Miglionico · Mar 15, 2018 · Viewed 23.4k times · Source

I have a time series dataframe, the dataframe is quite big and contain some missing values in the 2 columns('Humidity' and 'Pressure'). I would like to impute this missing values in a clever way, for example using the value of the nearest neighbor or the average of the previous and following timestamp.Is there an easy way to do it? I have tried with fancyimpute but the dataset contain around 180000 examples and give a memory error enter image description here

Answer

Peter Leimbigler picture Peter Leimbigler · Mar 15, 2018

Consider interpolate (Series - DataFrame). This example shows how to fill gaps of any size with a straight line:

df = pd.DataFrame({'date': pd.date_range(start='2013-01-01', periods=10, freq='H'), 'value': range(10)})
df.loc[2:3, 'value'] = np.nan
df.loc[6, 'value'] = np.nan
df
                 date  value
0 2013-01-01 00:00:00    0.0
1 2013-01-01 01:00:00    1.0
2 2013-01-01 02:00:00    NaN
3 2013-01-01 03:00:00    NaN
4 2013-01-01 04:00:00    4.0
5 2013-01-01 05:00:00    5.0
6 2013-01-01 06:00:00    NaN
7 2013-01-01 07:00:00    7.0
8 2013-01-01 08:00:00    8.0
9 2013-01-01 09:00:00    9.0

df['value'].interpolate(method='linear', inplace=True)
                 date  value
0 2013-01-01 00:00:00    0.0
1 2013-01-01 01:00:00    1.0
2 2013-01-01 02:00:00    2.0
3 2013-01-01 03:00:00    3.0
4 2013-01-01 04:00:00    4.0
5 2013-01-01 05:00:00    5.0
6 2013-01-01 06:00:00    6.0
7 2013-01-01 07:00:00    7.0
8 2013-01-01 08:00:00    8.0
9 2013-01-01 09:00:00    9.0