Filling date gaps in pandas dataframe

sten picture sten · Feb 12, 2015 · Viewed 12.6k times · Source

I have Pandas DataFrame (loaded from .csv) with Date-time as index.. where there is/have-to-be one entry per day. The problem is that I have gaps i.e. there is days for which I have no data at all. What is the easiest way to insert rows (days) in the gaps ? Also is there a way to control what is inserted in the columns as data ! Say 0 OR copy the prev day info OR to fill sliding increasing/decreasing values in the range from prev-date toward next-date data-values.

thanks

Here is example 01-03 and 01-04 are missing :

In [60]: df['2015-01-06':'2015-01-01']
Out[60]: 
           Rate  High (est)  Low (est)
Date                                      
2015-01-06  1.19643      0.0000     0.0000
2015-01-05  1.20368      1.2186     1.1889
2015-01-02  1.21163      1.2254     1.1980
2015-01-01  1.21469      1.2282     1.2014

Still experimenting but this seems to solve the problem :

df.set_index(pd.DatetimeIndex(df.Date),inplace=True)

and then resample... the reason being that importing the .csv with header-col-name Date, is not actually creating date-time-index, but Frozen-list whatever that means. resample() is expecting : if isinstance(ax, DatetimeIndex): .....


Here is my final solution :

  #make dates the index
  self.df.set_index(pd.DatetimeIndex(self.df.Date), inplace=True)
  #fill the gaps
  self.df = self.df.resample('D',fill_method='pad')
  #fix the Date column
  self.df.Date = self.df.index.values

I had to fix the Date column, because resample() just allow you to pad-it. It fixes the index correctly though, so I could use it to fix the Date column.

Here is snipped of the data after correction :

2015-01-29 2015-01-29  1.13262      0.0000     0.0000
2015-01-30 2015-01-30  1.13161      1.1450     1.1184
2015-01-31 2015-01-31  1.13161      1.1450     1.1184
2015-02-01 2015-02-01  1.13161      1.1450     1.1184

01-30, 01-31 are the new generated data.

Answer

Andy Hayden picture Andy Hayden · Feb 12, 2015

You'll could resample by day e.g. using mean if there are multiple entries per day:

df.resample('D', how='mean')

You can then ffill to replace NaNs with the previous days result.

See up and down sampling in the docs.