ValueError: The truth value of a DataFrame is ambiguous

Ken Wallace picture Ken Wallace · Jan 27, 2018 · Viewed 10k times · Source

I have a dataframe that looks like this:

        total   downloaded  avg_rating
id          
1        2      2           5.0
2       12     12           4.5
3        1      1           5.0
4        1      1           4.0
5        0      0           0.0

I'm trying to add a new column with the percent difference of two of these columns, but only for columns that do not have a 0 in 'downloaded'.

I'm trying to use a function for this that looks like:

def diff(ratings):
    if ratings[ratings.downloaded > 0]:
        val = (ratings['total'] - ratings['downloaded']) / ratings['downloaded']
    else:
        val = 0
    return val

ratings['Pct Diff'] = diff(ratings)

I'm getting an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-129-729c09bf14e8> in <module>()
      6     return val
      7 
----> 8 ratings['Pct Diff'] = diff(ratings)

<ipython-input-129-729c09bf14e8> in diff(ratings)
      1 def diff(ratings):
----> 2     if ratings[ratings.downloaded > 0]:
      3         val = (ratings['total'] - ratings['downloaded']) / 
ratings['downloaded']
      4     else:
      5         val = 0

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
    953         raise ValueError("The truth value of a {0} is ambiguous. "
    954                          "Use a.empty, a.bool(), a.item(), a.any() or 
a.all()."
--> 955                          .format(self.__class__.__name__))
    956 
    957     __bool__ = __nonzero__

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Can someone please help me understand what this error means?

Also, would this be a good application for an apply function? Can I use conditions in an apply? How would I use it in this case?

Answer

jpp picture jpp · Jan 28, 2018

The reason for your error is you are attempting to do a row-wise (vectorised calculation), but in fact in your function diff() ratings[ratings.downloaded > 0] returns a subset of the dataframe and preceding it by if is ambiguous. The error message reflect this.

You may wish to review Indexing and Selecting Data. The below solution sets the default value 0 by setting it at the beginning.

import pandas as pd

df = pd.DataFrame([[2, 2, 5.0], [12, 12, 4.5], [10, 5, 3.0],
                   [20, 2, 3.5], [3, 0, 0.0], [0, 0, 0.0]],
                  columns=['total', 'downloaded', 'avg_rating'])

df['Pct Diff'] = 0
df.loc[df['downloaded'] > 0, 'Pct Diff'] = (df['total'] - df['downloaded']) / df['total']

#   total   downloaded  avg_rating  Pct Diff
# 0 2   2   5.0 0.0
# 1 12  12  4.5 0.0
# 2 10  5   3.0 0.5
# 3 20  2   3.5 0.9
# 4 3   0   0.0 0.0
# 5 0   0   0.0 0.0