Pandas: DataFrame groupby for year/month and return with new DatetimeIndex

dirk picture dirk · Feb 18, 2016 · Viewed 10k times · Source

I need some directions in grouping a Pandas DateFrame object by year or month and get in return an new DateFrame object with a new index. Here is my code so far. groupby works as intended.

Load data from .csv file, parse 'Date' to date format (historical stock quotes from finance.yahoo.com)

In [23]: import pandas as pd
         file = pd.read_csv("sdf.de.csv", parse_dates=['Date'])
         file.head(2)

Out[23]:
    Date        Open    High    Low     Close   Volume  Adj Close
0   2016-02-16  18.650  18.70   17.940  18.16   1720800 17.0600
1   2016-02-15  18.295  18.64   18.065  18.50   1463500 17.3794

sort file for 'Date' ascending and set index to Date

In [24]: daily = file.sort_values(by='Date').set_index('Date')
         daily.head()

Out[24]:
            Open    High    Low     Close   Volume  Adj Close
Date                        
2000-01-03  14.20   14.50   14.15   14.40   277400  2.7916
2000-01-04  14.29   14.30   13.90   14.15   109200  2.7431

grouping for month

I would do an additional apply() to the groups, which would condense the data for the specific group, e.g. find the highest High value for the year/month or sum() the Volume values. This step is omitted for this example.

In [39]: monthly = daily.groupby(lambda x: (x.year, x.month))
         monthly.first()

Out[39]:
            Open    High    Low     Close   Volume  Adj Close
(2000, 1)   14.200  14.500  14.150  14.400  277400  2.7916
(2000, 2)   13.900  14.390  13.900  14.250  287200  2.7625
... ... ... ... ... ... ...
(2016, 1)   23.620  23.620  23.620  23.620  0       22.1893
(2016, 2)   19.575  19.630  19.140  19.450  1783000 18.2719

This works, but it gives me a DateFrame object with a tuple as index.

The desired result, in this case for grouping for month, would be a complete new DataFrame object, but the Date index should be a new DatetimeIndex in the form of %Y-%m or just %Y if grouped by year.

Out[39]:
        Open    High    Low     Close   Volume  Adj Close
Date
2000-01 14.200  14.500  14.150  14.400  277400  2.7916
2000-02 13.900  14.390  13.900  14.250  287200  2.7625
... ... ... ... ... ... ...
2016-01 23.620  23.620  23.620  23.620  0       22.1893
2016-02 19.575  19.630  19.140  19.450  1783000 18.2719

I'm thankful for any directions.

Answer

jezrael picture jezrael · Feb 18, 2016

You can use groupby with daily.index.year, daily.index.month or change index to_period and then groupby by index:

print daily
              Open   High    Low  Close   Volume  Adj Close
Date                                                       
2000-01-01  14.200  14.50  14.15  14.40   277400     2.7916
2000-02-01  13.900  14.39  13.90  14.25   287200     2.7625
2016-01-01  23.620  23.62  23.62  23.62        0    22.1893
2016-02-01  19.575  19.63  19.14  19.45  1783000    18.2719

print daily.groupby([daily.index.year, daily.index.month]).first()
          Open   High    Low  Close   Volume  Adj Close
2000 1  14.200  14.50  14.15  14.40   277400     2.7916
     2  13.900  14.39  13.90  14.25   287200     2.7625
2016 1  23.620  23.62  23.62  23.62        0    22.1893
     2  19.575  19.63  19.14  19.45  1783000    18.2719

daily.index = daily.index.to_period('M')
print daily.groupby(daily.index).first()
           Open   High    Low  Close   Volume  Adj Close
Date                                                    
2000-01  14.200  14.50  14.15  14.40   277400     2.7916
2000-02  13.900  14.39  13.90  14.25   287200     2.7625
2016-01  23.620  23.62  23.62  23.62        0    22.1893
2016-02  19.575  19.63  19.14  19.45  1783000    18.2719