I have a csv file similar to this
Date,Temp1,Temp2
23-Oct-09 01:00:00,21.1,22.3
23-Oct-09 04:00:00,22.3,23.8
23-Oct-09 07:00:00,21.4,21.3
23-Oct-09 10:00:00,21.5,21.6
23-Oct-09 13:00:00,22.3,23.8
23-Oct-09 16:00:00,21.4,21.3
23-Oct-09 19:00:00,21.1,22.3
23-Oct-09 22:00:00,21.4,21.3
24-Oct-09 01:00:00,22.3,23.8
24-Oct-09 04:00:00,22.3,23.8
24-Oct-09 07:00:00,21.1,22.3
24-Oct-09 10:00:00,22.3,23.8
24-Oct-09 13:00:00,21.1,22.3
24-Oct-09 16:00:00,22.3,23.8
24-Oct-09 19:00:00,21.1,22.3
24-Oct-09 22:00:00,22.3,23.8
I have read the data with:
df=pd.read_csv('data.csv', index_col=0)
and converted the index to date time
df.index=pd.to_datetime(df.index)
Now I want to take the mean of each daily temperature, I have been trying to use pd.resample as below, but have been receiving errors. I've read the pandas.resample docs and numerous examples on here and am still at a loss...
df_avg = df.resample('D', how = 'mean')
DataError: No numeric types to aggregate
I would like df_avg to be a dataframe with a datetime index and the two 2 columns. I am using pandas 0.17.1 and python 3.5.2, any help greatly appreciated!
You need convert string
columns to float
first:
#add parameter parse_dates for convert to datetime first column
df=pd.read_csv('data.csv', index_col=0, parse_dates=[0])
df['Temp1'] = df.Temp1.astype(float)
df['Temp2'] = df.Temp2.astype(float)
df_avg = df.resample('D').mean()
If astype
return error
, problem is there are some non numeric values. So you need use to_numeric
with errors='coerce'
- then all 'problematic' values are converted to NaN
:
df['Temp1'] = pd.to_numeric(df.Temp1, errors='coerce')
df['Temp2'] = pd.to_numeric(df.Temp2, errors='coerce')
You can also check all rows with problematic values with boolean indexing
:
print df[pd.to_numeric(df.Temp1, errors='coerce').isnull()]
print df[pd.to_numeric(df.Temp2, errors='coerce').isnull()]