I have a data frame data_df
with multiple columns, one of which is c
which holds country names. How do I filter out the rows where c == None
.
My first attempt was to do this:
countries_df = data_df[data_df.c != None]
However, that yielded 0 rows. This, however, worked:
countries_df = data_df[~data_df.c.isin([None])]
Can someone explain why? It seems that from the Pandas doc, the first should be able to filter correctly.
Some sample rows:
_heartbeat_ a al c cy g
0 NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; H... en-US US Anaheim 15r91
1 NaN Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ... en-us None NaN ifIpBW
2 NaN Mozilla/5.0 (Windows NT 6.1; rv:21.0) Gecko/20... en-US,en;q=0.5 US Fort Huachuca 10DaxOu
3 NaN Mozilla/5.0 (Linux; U; Android 4.1.2; en-us; S... en-US US Houston TysVFU
4 NaN Opera/9.80 (Android; Opera Mini/7.5.33286/29.3... en None NaN 10IGW7m
5 NaN Mozilla/5.0 (compatible; MSIE 10.0; Windows NT... en-US US Mishawaka 13GrCeP
6 NaN Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) G... en-US,en;q=0.5 US Hammond YmtpnZ
7 NaN Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 li... en-us None NaN 13oM0hV
8 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ... en-us AU Sydney 15r91
9 NaN Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi... en-US,en;q=0.8 None NaN 109LtDc
10 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ... en-us US Middletown 109ar5F
11 NaN Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like ... en-us US Germantown 107xZnW
It appears that pandas and Numpy treat None
specially when comparing for equality. In pandas, None
is supposed to be like NaN, representing a missing value. To find rows where the value is not None (or nan
), you could do data_df[data_df.c.notnull()]
(or data_df[~data_df.c.isnull()]
).