Neo4j vs. ArangoDB when modeling a social network

Florian Reisinger picture Florian Reisinger · Jan 31, 2016 · Viewed 9.8k times · Source

I want to build a social network. (E.g. Persons have other persons as friends) and I guess a graph database would do the trick better than a classic database. I would like to store attributes on the edges and on the nodes. They can be json, but I do not care if the DB understands JSON.

ArangoDB can also store documents and Neo4J is "only" a graph Database.

I would like to have an user node an to each person 2 eg.

Users -[username]-> person
Users -[ID]-> person

And there is a need that there is an index on the edges. I do not want a different database, so it would be nice to store an image (byte array) in the database, maybe even different sizes for each image / video whatever. Also posts and such should be stored in the database.

What I got is that Neo4j better supports an manufacture independent query language, but I guess it is easier and better to learn the manufacturer standard. Any recommendations on which database management system is better suited? I will be writing the code in Java (and some Scala).

Answer

mikewilliamson picture mikewilliamson · Feb 2, 2016

Both ArangoDB and Neo4j are capable of doing the job you have in mind. Both projects have amazing documentation and getting answers for either of them is easy. Both can be used from Java (though Neo4j can be embedded).

One thing that might help your decision making process is recognizing that many NoSQL databases solve a much narrower problem than people appreciate.

Sarah Mei wrote an epic blog post about MongoDB, using an example with some data about TV shows. From the summary:

MongoDB’s ideal use case is even narrower than our television data. The only thing it’s good at is storing arbitrary pieces of JSON.

I believe that Neo4j solves a similarly narrow problem, as evidenced by how common it is to use Neo4j alongside some other data store.

I don't know that storing picture or video data is a great idea in either ArangoDB or Neo4j. I would look to store it on some other server (like S3) and save the url to that file in Neo4j/Arango.

While it's true that it is possible to create queries that only a graph database can answer, the performance of graph database on any given query varies wildly and can give you some pretty surprising results. For instance, here is a paper from the International Journal of Computer Science and Information Technologies doing a comparison of Neo4j vs MySQL, Vertica and VoltDB with queries you would assume Neo4j would be amazing at: Performance comparison

The idea is that a "social network" does not automatically imply the superiority, or even the use of a graph database (especially since GraphQL and Falcor were released).

To address your question about query languages. There is no standard language for graph databases.

AQL is a query language that provides a unified interface for working with key/value, document and graph data.

Cypher is a graph query language.

Badwolf Query Language is a SPARQL inspired language for temporal graphs.

These languages exist because they tackle different problems. The databases that support them also tackle different problems.

Neo4j has an example of "polyglot persistence" on their site:

Using neo4j redis and mongo together

I think that is the problem that ArangoDB and AQL is out to solve, the hypothesis being that it's possible to solve that without being worse than specialists like Neo4j. So far it looks like they might be right.