I am using Pandas 0.19.1
on Python 3. I am getting a warning on these lines of code. I'm trying to get a list that contains all the row numbers where string Peter
is present at column Unnamed: 5
.
df = pd.read_excel(xls_path)
myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist()
It produces a Warning:
"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise
comparison failed; returning scalar, but in the future will perform
elementwise comparison
result = getattr(x, name)(y)"
What is this FutureWarning and should I ignore it since it seems to work.
This FutureWarning isn't from Pandas, it is from numpy and the bug also affects matplotlib and others, here's how to reproduce the warning nearer to the source of the trouble:
import numpy as np
print(np.__version__) # Numpy version '1.12.0'
'x' in np.arange(5) #Future warning thrown here
FutureWarning: elementwise comparison failed; returning scalar instead, but in the
future will perform elementwise comparison
False
Another way to reproduce this bug using the double equals operator:
import numpy as np
np.arange(5) == np.arange(5).astype(str) #FutureWarning thrown here
An example of Matplotlib affected by this FutureWarning under their quiver plot implementation: https://matplotlib.org/examples/pylab_examples/quiver_demo.html
There is a disagreement between Numpy and native python on what should happen when you compare a strings to numpy's numeric types. Notice the left operand is python's turf, a primitive string, and the middle operation is python's turf, but the right operand is numpy's turf. Should you return a Python style Scalar or a Numpy style ndarray of Boolean? Numpy says ndarray of bool, Pythonic developers disagree. Classic standoff.
Should it be elementwise comparison or Scalar if item exists in the array?
If your code or library is using the in
or ==
operators to compare python string to numpy ndarrays, they aren't compatible, so when if you try it, it returns a scalar, but only for now. The Warning indicates that in the future this behavior might change so your code pukes all over the carpet if python/numpy decide to do adopt Numpy style.
Numpy and Python are in a standoff, for now the operation returns a scalar, but in the future it may change.
https://github.com/numpy/numpy/issues/6784
https://github.com/pandas-dev/pandas/issues/7830
Either lockdown your version of python and numpy, ignore the warnings and expect the behavior to not change, or convert both left and right operands of ==
and in
to be from a numpy type or primitive python numeric type.
Suppress the warning globally:
import warnings
import numpy as np
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(5)) #returns False, without Warning
Suppress the warning on a line by line basis.
import warnings
import numpy as np
with warnings.catch_warnings():
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(2)) #returns False, warning is suppressed
print('x' in np.arange(10)) #returns False, Throws FutureWarning
Just suppress the warning by name, then put a loud comment next to it mentioning the current version of python and numpy, saying this code is brittle and requires these versions and put a link to here. Kick the can down the road.
TLDR: pandas
are Jedi; numpy
are the hutts; and python
is the galactic empire. https://youtu.be/OZczsiCfQQk?t=3