How to use pandas to find consecutive same data in time series

figo picture figo · Nov 13, 2014 · Viewed 12.7k times · Source

Here is a time series data like this,call it df:

      'No'       'Date'       'Value'
0     600000     1999-11-10    1
1     600000     1999-11-11    1
2     600000     1999-11-12    1
3     600000     1999-11-15    1
4     600000     1999-11-16    1
5     600000     1999-11-17    1
6     600000     1999-11-18    0
7     600000     1999-11-19    1
8     600000     1999-11-22    1
9     600000     1999-11-23    1
10    600000     1999-11-24    1
11    600000     1999-11-25    0
12    600001     1999-11-26    1
13    600001     1999-11-29    1
14    600001     1999-11-30    0

I want to get the date range of the consecutive 'Value' of 1, so how can I get the final result as follows:

   'No'     'BeginDate'    'EndDate'   'Consecutive'
0 600000    1999-11-10    1999-11-17    6
1 600000    1999-11-19    1999-11-24    4
2 600001    1999-11-26    1999-11-29    2

Answer

user1827356 picture user1827356 · Nov 13, 2014

This should do it

df['value_grp'] = (df.Values.diff(1) != 0).astype('int').cumsum()

value_grp will increment by one whenever Value changes. Below, you can extract the group results

pd.DataFrame({'BeginDate' : df.groupby('value_grp').Date.first(), 
              'EndDate' : df.groupby('value_grp').Date.last(),
              'Consecutive' : df.groupby('value_grp').size(), 
              'No' : df.groupby('value_grp').No.first()}).reset_index(drop=True)