Pandas slicing FutureWarning with 0.21.0

QuinRiva picture QuinRiva · Dec 19, 2017 · Viewed 37.4k times · Source

I'm trying to select a subset of a subset of a dataframe, selecting only some columns, and filtering on the rows.

df.loc[df.a.isin(['Apple', 'Pear', 'Mango']), ['a', 'b', 'f', 'g']]

However, I'm getting the error:

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

What 's the correct way to slice and filter now?

Answer

cs95 picture cs95 · Dec 19, 2017

This is a change introduced in v0.21.1, and has been explained in the docs at length -

Previously, selecting with a list of labels, where one or more labels were missing would always succeed, returning NaN for missing labels. This will now show a FutureWarning. In the future this will raise a KeyError (GH15747). This warning will trigger on a DataFrame or a Series for using .loc[] or [[]] when passing a list-of-labels with at least 1 missing label.

For example,

df

     A    B  C
0  7.0  NaN  8
1  3.0  3.0  5
2  8.0  1.0  7
3  NaN  0.0  3
4  8.0  2.0  7

Try some kind of slicing as you're doing -

df.loc[df.A.gt(6), ['A', 'C']]

     A  C
0  7.0  8
2  8.0  7
4  8.0  7

No problem. Now, try replacing C with a non-existent column label -

df.loc[df.A.gt(6), ['A', 'D']]
FutureWarning: Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

     A   D
0  7.0 NaN
2  8.0 NaN
4  8.0 NaN

So, in your case, the error is because of the column labels you pass to loc. Take another look at them.