Selecting rows from a Pandas dataframe with a compound (hierarchical) index

JD Long picture JD Long · Aug 13, 2012 · Viewed 30.7k times · Source

I'm suspicious that this is trivial, but I yet to discover the incantation that will let me select rows from a Pandas dataframe based on the values of a hierarchical key. So, for example, imagine we have the following dataframe:

import pandas
df = pandas.DataFrame({'group1': ['a','a','a','b','b','b'],
                       'group2': ['c','c','d','d','d','e'],
                       'value1': [1.1,2,3,4,5,6],
                       'value2': [7.1,8,9,10,11,12]
})
df = df.set_index(['group1', 'group2'])

df looks as we would expect:

enter image description here

If df were not indexed on group1 I could do the following:

df['group1' == 'a']

But that fails on this dataframe with an index. So maybe I should think of this like a Pandas series with a hierarchical index:

df['a','c']

Nope. That fails as well.

So how do I select out all the rows where:

  1. group1 == 'a'
  2. group1 == 'a' & group2 == 'c'
  3. group2 == 'c'
  4. group1 in ['a','b','c']

Answer

Wes McKinney picture Wes McKinney · Aug 13, 2012

Try using xs to be very precise:

In [5]: df.xs('a', level=0)
Out[5]: 
        value1  value2
group2                
c          1.1     7.1
c          2.0     8.0
d          3.0     9.0

In [6]: df.xs('c', level='group2')
Out[6]: 
        value1  value2
group1                
a          1.1     7.1
a          2.0     8.0