I have a dataframe df_tr
like this:
item_id target target_sum target_count
0 0 0 1 50
1 0 0 1 50
I'm trying to find the mean of the target but excluding the target value of the current row, and put the mean value in a new column. The result would be:
item_id target target_sum target_count item_id_mean_target
0 0 0 1 50 0.02041
1 0 0 1 50 0.02041
where I computed item_id_mean_target
value from the formula:
target_sum - target/target_count - 1
...with this code:
df_tr['item_id_mean_target'] = df_tr.target.apply(lambda x: (x['target_sum']-x)/(x['target_count']-1))
I think my solution is correct but instead I got:
TypeError: 'float' object is not subscriptable
No need for apply here, pandas (and therefore numpy) broadcasts operations.
df['item_id_mean_target'] = (df.target_sum - df.target) / (df.target_count - 1)
df
item_id target target_sum target_count item_id_mean_target
0 0 0 1 50 0.020408
1 0 0 1 50 0.020408
As for why your error occurs, you are calling apply
on a pd.Series
object, therefore, you cannot reference any other columns inside the apply
(since it only receives scalar values).
To fix it, you'd need to do df.apply(...)
but at that point, you're penalised with low performance, so, I wouldn't recommend doing it.