Big data visualization using "search, show context, and expand on demand" concept

Yang picture Yang · Feb 19, 2014 · Viewed 31.8k times · Source

I'm trying to visualize a really huge network (3M nodes and 13M edges) stored in a database. For real-time interactivity, I plan to show only a portion of the graph based on user queries and expand it on demand. For instance, when a user clicks a node, I expand its neighborhood. (This is called "Search, Show Context, Expand on Demand" on this paper).

I have looked into several visualization tools, including Gephi, D3, etc. They take a text file as input, but I don't have any idea how they can connect a database and update the graph based on users' interaction.

The linked paper implemented a system like that, but they didn't describe the tools they were using.

How can I visualize such data with above criteria?

Answer

MarcoL picture MarcoL · Feb 20, 2014

There are several solutions out there, but basically every one is using the same approach:

  1. create layer on top of your source to let you query at high level
  2. create a front end layer to talk with the level explained above
  3. use the visualization tool you want

As miro marchi pointed, there are several solutions to achieve this goal, some of them locked to particular data sources others with much more freedom but that would require some coding skills.

Datasource

I would start with the choice of the source type: from the type of data probably I would choice either Neo4J, Titan or OrientDB (if you fancy something more exotic with some sort of flexibility). All of them offer a JSON REST API, the former with a proprietary system and language (Cypher) and the other two using the Blueprint / Rexster system. Neo4J supports the Blueprint stack as well if you like Gremlin over Cypher.

For other solutions, such other NoSQL or SQL db probably you have to code a layer above with the relative REST API, but it will work as well - I wouldn't recommend that for the kind of data you have though.

Now, only the third point is left and here you have several choices.

Generic Viz tools

  • Sigma.js it's a free and open source tool for graph visualization quite nice. Linkurious is using a fork version of it as far as I know in their product.

  • Keylines it's a commercial graph visualization tool, with advanced stylings, analytics and layouts, and they provide copy/paste demos if you are using Neo4J or Titan. It is not free, but it does support even older browsers - IE7 onwards...

  • VivaGraph it's another free and open source tool for graph visualization tool - but it has a smaller community compared to SigmaJS.

  • D3.js it's the factotum for data visualization, you can do basically every kind of visualization based on that, but the learning curve is quite steep.

  • Gephi is another free and open source desktop solution, you have to use an external plugin with that probably but it does support most of the formats out there - graphML, CSV, Neo4J, etc...

Vendor specific

  • Linkurious it's a commercial Neo4J specific complete tool to search/investigate data.

  • Neo4J web-admin console - even if it's basic they've improved a lot with the newer version 2.x.x, based on D3.js.

There are also other solutions that I probably forgot to mention, but the ones above should offer a good variety.

Other nodes

The JS tools above will visualize well up to 1500/2000 nodes at once, due to JS limits.
If you want to visualize bigger stuff - while expanding - I would to recommend desktop solutions such Gephi.

Disclaimer

I'm part of the the Keylines dev team.