pandas get_level_values for multiple columns

Question 1

pandas get_level_values for multiple columns

python python-3.x pandas dataframe multi-index

danielhadar · Aug 22, 2016 · Viewed 8.9k times · Source

Answer

Answer

The .tolist() method of a MultiIndex gives a list of tuples for all the levels in the MultiIndex. For example, with your example DataFrame,

df.index.tolist()
# => [(1, 4, 10), (1, 4, 11), (1, 5, 12), (2, 5, 13), (2, 6, 14), (3, 7, 15)]

So here are two ideas:

Get the list of tuples from the original MultiIndex and filter the result.

[(a, c) for a, b, c in df.index.tolist()]
# => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

The disadvantage of this simple method is that you have you manually specify the order of the levels you want. You can leverage itertools.compress to select them by name instead.

from itertools import compress

mask = [1 if name in ['a', 'c'] else 0 for name in df.index.names]
[tuple(compress(t, mask)) for t in df.index.tolist()]
# => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

Create a MultiIndex that has exactly the levels you want and call .tolist() on it.

df.index.droplevel('b').tolist()
# => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

If you would prefer to name the levels you want to keep — instead of those that you want to drop — you could do something like

df.index.droplevel([level for level in df.index.names
                if not level in ['a', 'c']]).tolist()
# => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

Question 2

Is there a way to get the result of get_level_values for more than one column?

Given the following DataFrame:

I wish to get the values (i.e. list of tuples) of levels a and c:

[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

Notes:

It is impossible to give get_level_values more than one level (e.g. df.index.get_level_values(['a','c'])
There's a workaround in which one could use get_level_values over each desired column and zip them together:

For example:

a_list = df.index.get_level_values('a').values
c_list = df.index.get_level_values('c').values

print([i for i in zip(a_list,c_list)])
[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

but it get cumbersome as the number of columns grow.

The code to build the example DataFrame:

df = pd.DataFrame({'a':[1,1,1,2,2,3],'b':[4,4,5,5,6,7,],'c':[10,11,12,13,14,15], 'd':[16,17,18,19,20,21]}).set_index(['a','b','c'])

pandas get_level_values for multiple columns

Answer

Related questions