Use of loc to update a dataframe python pandas

Data Enthusiast picture Data Enthusiast · Dec 28, 2015 · Viewed 22.4k times · Source

I have a pandas dataframe (df) with the column structure :

month a b c d

this dataframe has data for say Jan, Feb, Mar, Apr. A,B,C,D are numeric columns. For the month of Feb , I want to recalculate column A and update it in the dataframe i.e. for month = Feb, A = B + C + D

Code I used :

 df[df['month']=='Feb']['A']=df[df['month']=='Feb']['B'] + df[df['month']=='Feb']['C'] + df[df['month']=='Feb']['D'] 

This ran without errors but did not change the values in column A for the month Feb. In the console, it gave a message that :

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I tried to use .loc but right now the dataframe I am working on, I had used .reset_index() on it and I am not sure how to set index and use .loc. I followed documentation but not clear. Could you please help me out here? This is an example dataframe :

 import pandas as pd import numpy as np
 dates = pd.date_range('1/1/2000', periods=8)
 df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) 

I want to update say one date : 2000-01-03. I am unable to give the snippet of my data as it is real time data.

Answer

Anton Protopopov picture Anton Protopopov · Dec 28, 2015

As you could see from the warning you should use loc[row_index, col_index]. When you subsetting your data you get index values. You just need to pass for row_index and then with comma col_name:

df.loc[df['month'] == 'Feb', 'A'] = df.loc[df['month'] == 'Feb', 'B'] + df.loc[df['month'] == 'Feb', 'C'] + df.loc[df['month'] == 'Feb', 'D']