I have a pandas dataframe (df) with the column structure :
month a b c d
this dataframe has data for say Jan, Feb, Mar, Apr. A,B,C,D are numeric columns. For the month of Feb , I want to recalculate column A and update it in the dataframe i.e. for month = Feb, A = B + C + D
Code I used :
df[df['month']=='Feb']['A']=df[df['month']=='Feb']['B'] + df[df['month']=='Feb']['C'] + df[df['month']=='Feb']['D']
This ran without errors but did not change the values in column A for the month Feb. In the console, it gave a message that :
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I tried to use .loc but right now the dataframe I am working on, I had used .reset_index()
on it and I am not sure how to set index and use .loc. I followed documentation but not clear. Could you please help me out here?
This is an example dataframe :
import pandas as pd import numpy as np
dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
I want to update say one date : 2000-01-03. I am unable to give the snippet of my data as it is real time data.
As you could see from the warning you should use loc[row_index, col_index]
. When you subsetting your data you get index values. You just need to pass for row_index
and then with comma col_name
:
df.loc[df['month'] == 'Feb', 'A'] = df.loc[df['month'] == 'Feb', 'B'] + df.loc[df['month'] == 'Feb', 'C'] + df.loc[df['month'] == 'Feb', 'D']