I have a problem filtering a pandas
dataframe.
city
NYC
NYC
NYC
NYC
SYD
SYD
SEL
SEL
...
df.city.value_counts()
I would like to remove rows of cities that has less than 4 count frequency, which would be SYD and SEL for instance.
What would be the way to do so without manually dropping them city by city?
Here you go with filter
df.groupby('city').filter(lambda x : len(x)>3)
Out[1743]:
city
0 NYC
1 NYC
2 NYC
3 NYC
Solution two transform
sub_df = df[df.groupby('city').city.transform('count')>3].copy()
# add copy for future warning when you need to modify the sub df