Fastest way to perform bulk add/insert in Neo4j with Python?

wodow picture wodow · Sep 28, 2012 · Viewed 11.6k times · Source

I am finding Neo4j slow to add nodes and relationships/arcs/edges when using the REST API via py2neo for Python. I understand that this is due to each REST API call executing as a single self-contained transaction.

Specifically, adding a few hundred pairs of nodes with relationships between them takes a number of seconds, running on localhost.

What is the best approach to significantly improve performance whilst staying with Python?

Would using bulbflow and Gremlin be a way of constructing a bulk insert transaction?

Thanks!

Answer

Nigel Small picture Nigel Small · Sep 29, 2012

There are several ways to do a bulk create with py2neo, each making only a single call to the server.

  1. Use the create method to build a number of nodes and relationships in a single batch.
  2. Use a cypher CREATE statement.
  3. Use the new WriteBatch class (just released this week) to manually make a batch of nodes and relationships (this is really just a manual version of 1).

If you have some code, I'm happy to look at it and make suggestions on performance tweaks. There are also quite a few tests you may be able to get inspiration from.

Cheers, Nige