Join operation with NOSQL

Sri picture Sri · Jan 3, 2010 · Viewed 31.2k times · Source

I have gone through some articles regarding Bigtable and NOSQL. It is very interesting that they avoid JOIN operations.

As a basic example, let's take Employee and Department table and assume the data is spread across multiple tables / servers.

Just want to know, if data is spread across multiple servers, how do we do JOIN or UNION operations?

Answer

MarkR picture MarkR · Jan 3, 2010

When you have extremely large data, you probably want to avoid joins. This is because the overhead of an individual key lookup is relatively large (the service needs to figure out which node(s) to query, and query them in parallel and wait for responses). By overhead, I mean latency, not throughput limitation.

This makes joins suck really badly as you'd need to do a lot of foreign key lookups, which would end up going to many,many different nodes (in many cases). So you'd want to avoid this as a pattern.

If it doesn't happen very often, you could probably take the hit, but if you're going to want to do a lot of them, it may be worth "denormalising" the data.

The kind of stuff which gets stored in NoSQL stores is typically pretty "abnormal" in the first place. It is not uncommon to duplicate the same data in all sorts of different places to make lookups easier.

Additionally most nosql don't (really) support secondary indexes either, which means you have to duplicate stuff if you want to query by any other criterion.

If you're storing data such as employees and departments, you're really better off with a conventional database.