Python: Removing Rows on Count condition

python pandas dataframe indexing counter

Devin Lee · Apr 9, 2018 · Viewed 13k times · Source

I have a problem filtering a pandas dataframe.

city 
NYC 
NYC 
NYC 
NYC 
SYD 
SYD 
SEL 
SEL
...

df.city.value_counts()

I would like to remove rows of cities that has less than 4 count frequency, which would be SYD and SEL for instance.

What would be the way to do so without manually dropping them city by city?

Answer

Here you go with filter

df.groupby('city').filter(lambda x : len(x)>3)
Out[1743]: 
  city
0  NYC
1  NYC
2  NYC
3  NYC

Solution two transform

sub_df = df[df.groupby('city').city.transform('count')>3].copy() 
# add copy for future warning when you need to modify the sub df