D3: How to show large dataset

SolessChong picture SolessChong · Aug 15, 2013 · Viewed 25.2k times · Source

I've a large dataset comprises 10^5 data points. And now I'm considering the following question related to large dataset:

Is there any efficient way to visualize very large dataset? In my case I have a user set and each user has 10^3 items. There are 10^5 items in total. I want to show all the items for each user at a time to enable quick comparison between users. Some body suggests using a list, but I don't think a list is the only choice when dealing with this big dataset.

Note

I want to show all the items for each user at a time.

This means I want to show all the datapoints when click on a user, and when I click on two uses, I can compare the difference between there datapoints.

Answer

Biovisualize picture Biovisualize · Aug 15, 2013

The problem is not to render them. You could switch to canvas or webgl for the rendering part. You can find some examples of using canvas and X3DOM with D3 data-binding. But it will be slow because of the number of DOM objects, so it's better to keep them separated, as in this parallel coordinates example. This example also features progressive rendering to load and render all the data elements.

Keeping them in memory and manipulating them client-side is not a problem neither. D3 is often used with Crossfilter for quick data manipulation of "million or more records".

10^5 data points are just slightly too many points for SVG interactive rendering. But too many data points in a visualization is often a hint that you have the wrong level of abstraction or the wrong plotting strategy. A lot of points will probably overlap or visually fuse. So why not aggregate these shapes, for example using heatmap (color scale for number of overlapping points), binning (hexbin, histogram), or summarizing the dataset?

If what you want is an overview, and comparing datasets, you probably need an abstraction, like some statistics summarizing your dataset, then see a detail on-demand (semantic zoom, focus+context, drill-down).