I import pandas as pd and run the code below and get the following result
Code:
traindataset = pd.read_csv('/Users/train.csv')
print traindataset.dtypes
print traindataset.shape
print traindataset.iloc[25,3]
traindataset.dropna(how='any')
print traindataset.iloc[25,3]
print traindataset.shape
Output
TripType int64
VisitNumber int64
Weekday object
Upc float64
ScanCount int64
DepartmentDescription object
FinelineNumber float64
dtype: object
(647054, 7)
nan
nan
(647054, 7)
[Finished in 2.2s]
From the result, the dropna line doesn't work because the row number doesn't change and there is still NAN in the dataframe. How that comes? I am craaaazy right now.
You need to read the documentation (emphasis added):
Return object with labels on given axis omitted
dropna
returns a new DataFrame. If you want it to modify the existing DataFrame, all you have to do is read further in the documentation:
inplace : boolean, default False
If True, do operation inplace and return None.
So to modify it in place, do traindataset.dropna(how='any', inplace=True)
.