I want to implement a taxonomy structure (geo terms) for my node.js application with NoSQL database. I had a similar taxonomy structure with MySQL but it's time to move forward and learn something new so I decided to try a different approach and use NoSQL (document-oriented) for my test app. The taxonomy structure is simple - there're five different levels: country (i.e. United Kingdom) → region (England) → county (Merseyside) → city/town/village (Liverpool) → part of the city (Toxteth).
The obvious choice is to use a tree structure but the devil is in the detail - historically some cities and towns belonged to other counties. The idea was to tag persons who were born in certain cities or towns with those terms and to filter them later by geo tags so I have to respect the fact Liverpool or Manchester (among others) were part of Lancashire at the time some people were born. Otherwise the result any user gets with my geo filter will be incorrect.
Example: John Doe was born in Blackburn (Lancashire) back in 1957. Paul Brown was born in 1960 in Liverpool (Lancashire, now Merseyside). Georgia Doe (nee Jones) was born in Wirral (Cheshire, now Merseyside) 5 years later. Their son Ringo was born in Liverpool (Merseyside by that time) in 1982.
John is Lancastrian by birth, Paul is Lancastrian and Merseysider, Georgia is from Cheshire and Merseyside at the same time, Ringo is from Merseyside. So they should be categorized accordingly when I search by county. But with simple one-to-many structure that follows modern structure of the country they'll never be filtered as they should be.
How to implement the collection respecting the complexity of its structure with NoSQL (first of all document-oriented) solutions? I googled it and did some research over stack* but still had no clue what to do next with it. There's a few possible ways to solve it in my opinion:
Use SQL-like data structure:
{
{'name': 'United Kingdom', 'unique_id': 1},
{'name': 'England', 'unique_id': 2, 'parents': [1]},
{'name': 'Merseyside', 'unique_id': 3, 'parents': [2]},
{'name': 'Lancashire', 'unique_id': 4, 'parents': [2]},
{'name': 'Liverpool', 'unique_id': 5, 'parents': [3, 4]},
}
Use tree structure with some references:
{
{'name': 'United Kingdom', 'unique_id': 1
{'name': 'England', 'unique_id': 2]
{'name': 'Merseyside', 'unique_id': 3]
{'name': 'Liverpool', 'unique_id': 5, 'alternate_parents': [4]},
},
{'name': 'Lancashire', 'unique_id': 4},
},
},
}
Use tree structure with no references (one-to-many) and add "alternate parent" tag to a document manually:
{
{'name': 'United Kingdom', 'unique_id': 1
{'name': 'England', 'unique_id': 2]
{'name': 'Merseyside', 'unique_id': 3]
{'name': 'Liverpool', 'unique_id': 5},
},
{'name': 'Lancashire', 'unique_id': 4},
},
},
}
Stick with SQL.
Give me advice on that matter please. I'm a newby with any NoSQL (currently I've designed no such databases) so there's a real design issue for me.
And I'm new to stack* so feel free to correct me if I did anything wrong with this post :) Thank you!
EDIT I've chosen @Jonathan answer as a solution. I think it suits better for my needs (there'll be other documents to store in my database and tag them with those terms) especially with mapReduce functionality suggested by @Valentyn.
But if there's no document collections needed for your app a graph database (based on relationships not documents) suggested by @Philipp is probably the best solution possible.
Because of a comment you made, I assume that you mean "MongoDB" when you say "NoSQL". There are a lot of other database technologies commonly referred to as NoSQL which are completely different, but this one seems to be the one you mean.
is not a good idea, because to get the whole taxonomy chain you will need to do multiple database queries, which should generally be avoided.
and 3. A single document which is a huge tree is not a good idea either, because MongoDB has a limit of 16MB per document. When you create huge, monolithic documents, you might hit that limit.
I think that MongoDB might not be the best solution for your use-case. Did you consider using a graph database? MongoDB is optimized for self-contained documents which stand on their own. But the focus of graph databases is on datasets where you have a lot of entities which are defined by their relations to other entities. This looks a lot like your use-case.