When is it appropriate to use df.value_counts() vs df.groupby('...').count()?

Ollie Khakwani picture Ollie Khakwani · Nov 25, 2017 · Viewed 25.1k times · Source

I've heard in Pandas there's often multiple ways to do the same thing, but I was wondering –

If I'm trying to group data by a value within a specific column and count the number of items with that value, when does it make sense to use df.groupby('colA').count() and when does it make sense to use df['colA'].value_counts() ?

Answer

jezrael picture jezrael · Nov 25, 2017

There is difference value_counts return:

The resulting object will be in descending order so that the first element is the most frequently-occurring element.

but count not, it sort output by index (created by column in groupby('col')).


df.groupby('colA').count() 

is for aggregate all columns of df by function count. So it count values excluding NaNs.

So if need count only one column need:

df.groupby('colA')['colA'].count() 

Sample:

df = pd.DataFrame({'colB':list('abcdefg'),
                   'colC':[1,3,5,7,np.nan,np.nan,4],
                   'colD':[np.nan,3,6,9,2,4,np.nan],
                   'colA':['c','c','b','a',np.nan,'b','b']})

print (df)
  colA colB  colC  colD
0    c    a   1.0   NaN
1    c    b   3.0   3.0
2    b    c   5.0   6.0
3    a    d   7.0   9.0
4  NaN    e   NaN   2.0
5    b    f   NaN   4.0
6    b    g   4.0   NaN

print (df['colA'].value_counts())
b    3
c    2
a    1
Name: colA, dtype: int64

print (df.groupby('colA').count())
      colB  colC  colD
colA                  
a        1     1     1
b        3     2     2
c        2     2     1

print (df.groupby('colA')['colA'].count())
colA
a    1
b    3
c    2
Name: colA, dtype: int64