What is a fast and proper way to refresh/update plots in Bokeh (0.11) server app?

user2700854 picture user2700854 · Jan 24, 2016 · Viewed 18.6k times · Source

I have a bokeh (v0.11) serve app that produces a scatter plot using (x,y) coordinates from a data frame. I want to add interactions such that when a user either selects points on the plot or enters the name of comma-separated points in the text box (ie. "p55, p1234"), then those points will turn red on the scatter plot.

I have found one way to accomplish this (Strategy #3, below) but it is terribly slow for large dataframes. I would think there is a better method. Can anyone help me out? Am I missing some obvious function call?

  • Strategy 1 (<1ms for 100 points) drills into the ColumnDataSource data for the exist plot and attempts to change the selected points.
  • Strategy 2 (~70ms per 100 points) overwrites the plot's existing ColumnDataSource with a newly created ColumnDataSource.
  • Strategy 3 (~400ms per 100 points) is Strategy 2 and then it re-creates the figure.

Code is deposited on pastebin: http://pastebin.com/JvQ1UpzY Most relevant portion is copied below.

def refresh_graph(self, selected_points=None, old_idxs=None, new_idxs=None):
    # Strategy 1: Cherry pick current plot's source.
    # Compute time for 100 points: < 1ms.
    if self.strategy == 1:
        t1 = datetime.now()
        for idx in old_idxs:
            self.graph_plot.data_source.data['color'][idx] = 'steelblue'
        for idx in new_idxs:
            self.graph_plot.data_source.data['color'][idx] = 'red'
        print('Strategy #1 completed in {}'.format(datetime.now() - t1))
    else:
        t3 = datetime.now()
        self.coords['color'] = 'steelblue'
        self.coords.loc[selected_points, 'color'] = 'red'
        new_source = bkmodels.ColumnDataSource(self.coords)
        self.graph_plot = self.graph_fig.scatter('x', 'y', source=new_source, color='color', alpha=0.6)
        print('Strategy #3 completed in {}'.format(datetime.now() - t3))
    return

Ideally, I would like to be able to use Strategy #1, but it does not seem to allow the points to refresh within the client browser.

Thanks for any help!

FYI: I am using RHEL 6.X

Answer

bigreddot picture bigreddot · May 12, 2016

If you are streaming data, then there is a related answer here: Timeseries streaming in bokeh

If you need update everything at once, then you can do that, and my suggestion is your Strategy 1, which is demonstrated, e.g. here:

https://github.com/bokeh/bokeh/blob/master/examples/app/sliders.py

The particular thing to note is that you really have to update all of source.data in one go. One of the assumptions is that all the columns of a column data source always have the same length. Updating individual columns runs the risk of breaking this assumption, which can cause problems. So you want to update all at once, with something like:

# Generate the new curve
x = np.linspace(0, 4*np.pi, N)
y = a*np.sin(k*x + w) + b

source.data = dict(x=x, y=y)