I am trying to label a scatter/bubble chart I create from matplotlib with entries from a column in a pandas data frame. I have seen plenty of examples and questions related (see e.g. here and here). Hence I tried to annotate the plot accordingly. Here is what I do:
import matplotlib.pyplot as plt
import pandas as pd
#example data frame
x = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
y = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600]
s = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
users =['mark', 'mark', 'mark', 'rachel', 'rachel', 'rachel', 'jeff', 'jeff', 'jeff', 'lauren', 'lauren', 'lauren']
df = pd.DataFrame(dict(x=x, y=y, users=users)
#my attempt to plot things
plt.scatter(x_axis, y_axis, s=area, alpha=0.5)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.annotate(df.users, xy=(x,y))
plt.show()
I use a pandas datframe and I somehow get a KeyError- so I guess a dict()
object is expected? Is there any other way to label the data using with entries from a pandas data frame?
You can use DataFrame.plot.scatter
and then select in loop by DataFrame.iat
:
ax = df.plot.scatter(x='x', y='y', alpha=0.5)
for i, txt in enumerate(df.users):
ax.annotate(txt, (df.x.iat[i],df.y.iat[i]))
plt.show()