How to improve the label placement for matplotlib scatter chart (code,algorithm,tips)?

bigbug picture bigbug · Feb 18, 2013 · Viewed 12.5k times · Source

I use matplotlib to plot a scatter chart:

enter image description here

And label the bubble using a transparent box according to the tip at matplotlib: how to annotate point on a scatter automatically placed arrow?

Here is the code:

if show_annote:
    for i in range(len(x)):
        annote_text = annotes[i][0][0]  # STK_ID
        ax.annotate(annote_text, xy=(x[i], y[i]), xytext=(-10,3),
            textcoords='offset points', ha='center', va='bottom',
            bbox=dict(boxstyle='round,pad=0.2', fc='yellow', alpha=0.2),
            fontproperties=ANNOTE_FONT) 

and the resulting plot: enter image description here

But there is still room for improvement to reduce overlap (for instance the label box offset is fixed as (-10,3)). Are there algorithms that can:

  1. dynamically change the offset of label box according to the crowdedness of its neighbourhood
  2. dynamically place the label box remotely and add an arrow line beween bubble and label box
  3. somewhat change the label orientation
  4. label_box overlapping bubble is better than label_box overlapping label_box?

I just want to make the chart easy for human eyes to comprehand, so some overlap is OK, not as rigid a constraint as http://en.wikipedia.org/wiki/Automatic_label_placement suggests. And the bubble quantity within the chart is less than 150 most of the time.

I find the so called Force-based label placement http://bl.ocks.org/MoritzStefaner/1377729 is quite interesting. I don't know if there is any python code/package available to implement the algorithm.

I am not an academic guy and not looking for an optimum solution, and my python codes need to label many many charts, so the the speed/memory is in the scope of consideration.

I am looking for a quick and effective solution. Any help (code,algorithm,tips,thoughts) on this subject? Thanks.

Answer

Phlya picture Phlya · Jan 7, 2017

Another option using my library adjustText, written specially for this purpose (https://github.com/Phlya/adjustText).

from adjustText import adjust_text
np.random.seed(2016)

N = 50
scatter_data = np.random.rand(N, 3)
fig, ax = plt.subplots()
ax.scatter(scatter_data[:, 0], scatter_data[:, 1],
           c=scatter_data[:, 2], s=scatter_data[:, 2] * 150)
labels = ['ano_{}'.format(i) for i in range(N)]
texts = []
for x, y, text in zip(scatter_data[:, 0], scatter_data[:, 1], labels):
    texts.append(ax.text(x, y, text))
plt.show()

enter image description here

np.random.seed(2016)

N = 50
scatter_data = np.random.rand(N, 3)
fig, ax = plt.subplots()
ax.scatter(scatter_data[:, 0], scatter_data[:, 1],
           c=scatter_data[:, 2], s=scatter_data[:, 2] * 150)
labels = ['ano_{}'.format(i) for i in range(N)]
texts = []
for x, y, text in zip(scatter_data[:, 0], scatter_data[:, 1], labels):
    texts.append(ax.text(x, y, text))
adjust_text(texts, force_text=0.05, arrowprops=dict(arrowstyle="-|>",
                                                    color='r', alpha=0.5))
plt.show()

enter image description here

It doesn't repel from the bubbles, only from their centers and other texts.