Find closest row of DataFrame to given time in Pandas

robintw picture robintw · Feb 27, 2013 · Viewed 19.5k times · Source

I have a Pandas dataframe which is indexed by a DatetimeIndex:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23
Data columns:
Date(dd-mm-yy)_Time(hh-mm-ss)       53732  non-null values
Julian_Day                          53732  non-null values
AOT_870                             53732  non-null values
440-870Angstrom                     53732  non-null values
440-675Angstrom                     53732  non-null values
500-870Angstrom                     53732  non-null values
Last_Processing_Date(dd/mm/yyyy)    53732  non-null values
Solar_Zenith_Angle                  53732  non-null values
time                                53732  non-null values
dtypes: datetime64[ns](2), float64(6), object(1)

I want to find the row that is closest to a certain time:

image_time = dateutil.parser.parse('2009-07-28 13:39:02')

and find how close it is. So far, I have tried various things based upon the idea of subtracting the time I want from all of the times and finding the smallest absolute value, but none quite seem to work.

For example:

aeronet.index - image_time

Gives an error which I think is due to +/- on a Datetime index shifting things, so I tried putting the index into another column and then working on that:

aeronet['time'] = aeronet.index
aeronet.time - image_time

This seems to work, but to do what I want, I need to get the ABSOLUTE time difference, not the relative difference. However, just running abs or np.abs on it gives an error:

abs(aeronet.time - image_time)

C:\Python27\lib\site-packages\pandas\core\series.pyc in __repr__(self)
   1061         Yields Bytestring in Py2, Unicode String in py3.
   1062         """
-> 1063         return str(self)
   1064 
   1065     def _tidy_repr(self, max_vals=20):

C:\Python27\lib\site-packages\pandas\core\series.pyc in __str__(self)
   1021         if py3compat.PY3:
   1022             return self.__unicode__()
-> 1023         return self.__bytes__()
   1024 
   1025     def __bytes__(self):

C:\Python27\lib\site-packages\pandas\core\series.pyc in __bytes__(self)
   1031         """
   1032         encoding = com.get_option("display.encoding")
-> 1033         return self.__unicode__().encode(encoding, 'replace')
   1034 
   1035     def __unicode__(self):

C:\Python27\lib\site-packages\pandas\core\series.pyc in __unicode__(self)
   1044                     else get_option("display.max_rows"))
   1045         if len(self.index) > (max_rows or 1000):
-> 1046             result = self._tidy_repr(min(30, max_rows - 4))
   1047         elif len(self.index) > 0:
   1048             result = self._get_repr(print_header=True,

C:\Python27\lib\site-packages\pandas\core\series.pyc in _tidy_repr(self, max_vals)
   1069         """
   1070         num = max_vals // 2
-> 1071         head = self[:num]._get_repr(print_header=True, length=False,
   1072                                     name=False)
   1073         tail = self[-(max_vals - num):]._get_repr(print_header=False,

AttributeError: 'numpy.ndarray' object has no attribute '_get_repr'

Am I approaching this the right way? If so, how should I get abs to work, so that I can then select the minimum absolute time difference, and thus get the closest time. If not, what is the best way to do this with a Pandas time-series?

Answer

cmeeren picture cmeeren · Nov 14, 2013

This simple method will return the (integer index of the) TimeSeriesIndex entry closest to a given datetime object. There's no need to copy the index to a regular column - simply use the .to_pydatetime method instead.

import numpy as np

i = np.argmin(np.abs(df.index.to_pydatetime() - image_time))

Then you simply use the DataFrame's .iloc indexer:

df.iloc[i]

Here's a function to do this:

def fcl(df, dtObj):
    return df.iloc[np.argmin(np.abs(df.index.to_pydatetime() - dtObj))]

You can then further filter seamlessly, e.g.

fcl(df, dtObj)['column']