pandas reindex DataFrame with datetime objects

BFTM picture BFTM · Jun 8, 2012 · Viewed 14k times · Source

Is it possible to reindex a pandas DataFrame using a column made up of datetime objects?

I have a DataFrame df with the following columns:

Int64Index: 19610 entries, 0 to 19609
Data columns:
cntr                  19610  non-null values  #int
datflt                19610  non-null values  #float
dtstamp               19610  non-null values  #datetime object
DOYtimestamp          19610  non-null values  #float
dtypes: int64(1), float64(2), object(1)

I can reindex the df easily along DOYtimestamp with: df.reindex(index=df.dtstamp) and DOYtimestamp has the following values:

>>> df['DOYtimestamp'].values
    array([ 153.76252315,  153.76253472,  153.7625463 , ...,  153.98945602,
    153.98946759,  153.98947917])

but I'd like to reindex the DataFrame along dtstamp which is made up of datetime objects so that I generate different timestamps directly from the index. The dtstamp column has values which look like:

 >>> df['dtstamp'].values
     array([2012-06-02 18:18:02, 2012-06-02 18:18:03, 2012-06-02 18:18:04, ...,
     2012-06-02 23:44:49, 2012-06-02 23:44:50, 2012-06-02 23:44:51], 
     dtype=object)

When I try and reindex df along dtstamp I get the following:

>>> df.reindex(index=df.dtstamp)
    TypeError: can't compare datetime.datetime to long

I'm just not sure what I need to do get the index to be of a datetime type. Any thoughts?

Answer

BrenBarn picture BrenBarn · Jun 8, 2012

It sounds like you don't want reindex. Somewhat confusingly reindex is not for defining a new index, exactly; rather, it looks for rows that have the specified indices. So if you have a DataFrame with index [0, 1, 2], then doing a reindex([2, 1, 0]) will return the rows in reverse order. Doing something like reindex([8, 9, 10]) does not make a new index for the rows; rather, it will return a DataFrame with NaN values, since there are no rows with indices 8, 9, or 10.

It seems like what you want is to just keep the same rows, but make a totally new index for them. For that you can just assign to the index directly. So try doing df.index = df['dtstamp'].