My dataframe round_data
looks like this:
error username task_path
0 0.02 n49vq14uhvy93i5uw33tf7s1ei07vngozrzlsr6q6cnh8w... 39.png
1 0.10 n49vq14uhvy93i5uw33tf7s1ei07vngozrzlsr6q6cnh8w... 45.png
2 0.15 n49vq14uhvy93i5uw33tf7s1ei07vngozrzlsr6q6cnh8w... 44.png
3 0.25 xdoaztndsxoxk3wycpxxkhaiew3lrsou3eafx3em58uqth... 43.png
... ... ... ...
1170 -0.11 9qrz4829q27cu3pskups0vir0ftepql7ynpn6in9hxx3ux... 33.png
1171 0.15 9qrz4829q27cu3pskups0vir0ftepql7ynpn6in9hxx3ux... 34.png
[1198 rows x 3 columns]
I want to have a boxplot showing the error of each user sorted by their average performance. What I have is:
ax = sns.boxplot(x="username", y="error", data=round_data,
whis=np.inf, color="c",ax=ax)
How can I sort the x-axis (i.e., users) by mean error?
ok, I figured out the answer:
grouped = round_data[round_data.batch==i].groupby("username")
users_sorted_average = pd.DataFrame({col:vals['absolute_error'] for col,vals in grouped}).mean().sort_values(ascending=True)
passing users_sorted_average
for the "order" parameter in the seaborn plot function would give the desired behavior:
ax = sns.boxplot(x="username", y="error", data=round_data,
whis=np.inf,ax=ax,color=c,order=users_sorted_average.index)