I have a bokeh (v0.11) serve app that produces a scatter plot using (x,y) coordinates from a data frame. I want to add interactions such that when a user either selects points on the plot or enters the name of comma-separated points in the text box (ie. "p55, p1234"), then those points will turn red on the scatter plot.
I have found one way to accomplish this (Strategy #3, below) but it is terribly slow for large dataframes. I would think there is a better method. Can anyone help me out? Am I missing some obvious function call?
Code is deposited on pastebin: http://pastebin.com/JvQ1UpzY Most relevant portion is copied below.
def refresh_graph(self, selected_points=None, old_idxs=None, new_idxs=None):
# Strategy 1: Cherry pick current plot's source.
# Compute time for 100 points: < 1ms.
if self.strategy == 1:
t1 = datetime.now()
for idx in old_idxs:
self.graph_plot.data_source.data['color'][idx] = 'steelblue'
for idx in new_idxs:
self.graph_plot.data_source.data['color'][idx] = 'red'
print('Strategy #1 completed in {}'.format(datetime.now() - t1))
else:
t3 = datetime.now()
self.coords['color'] = 'steelblue'
self.coords.loc[selected_points, 'color'] = 'red'
new_source = bkmodels.ColumnDataSource(self.coords)
self.graph_plot = self.graph_fig.scatter('x', 'y', source=new_source, color='color', alpha=0.6)
print('Strategy #3 completed in {}'.format(datetime.now() - t3))
return
Ideally, I would like to be able to use Strategy #1, but it does not seem to allow the points to refresh within the client browser.
Thanks for any help!
FYI: I am using RHEL 6.X
If you are streaming data, then there is a related answer here: Timeseries streaming in bokeh
If you need update everything at once, then you can do that, and my suggestion is your Strategy 1, which is demonstrated, e.g. here:
https://github.com/bokeh/bokeh/blob/master/examples/app/sliders.py
The particular thing to note is that you really have to update all of source.data
in one go. One of the assumptions is that all the columns of a column data source always have the same length. Updating individual columns runs the risk of breaking this assumption, which can cause problems. So you want to update all at once, with something like:
# Generate the new curve
x = np.linspace(0, 4*np.pi, N)
y = a*np.sin(k*x + w) + b
source.data = dict(x=x, y=y)