database design - categories and sub-categories

dsb picture dsb · Jul 25, 2015 · Viewed 22.2k times · Source

I need to implement Categorization and Sub-Categorization on something which is a bit similar to golden pages.

Assume I have the following table:

Category Table

CategoryId, Title
10, Home
20, Business
30, Hobbies

I have two options to code the sub-categorization.

OPTION 1 - Subcategory Id is unique within Category ONLY:

Sub Category Table

CategoryId, SubCategoryId, Title
10, 100, Gardening
10, 110, Kitchen
10, 120, ...
20, 100, Development
20, 110, Marketing
20, 120, ...
30, 100, Soccer
30, 110, Reading
30, 120, ...

OPTION 2 - Subcategory Id is unique OVERALL:

Sub Category Table

CategoryId, SubCategoryId, Title
10, 100, Gardening
10, 110, Kitchen
10, 120, ...
20, 130, Development
20, 140, Marketing
20, 150, ...
30, 160, Soccer
30, 170, Reading
30, 180, ...

Option 2 sounds like it is easier to fetch rows from table For example: SELECT BizTitle FROM tblBiz WHERE SubCatId = 170

whereas using Option 1 I'd have to write something like this:

SELECT BizTitle FROM tblBiz WHERE CatId = 30 AND SubCatId = 170

i.e., containing an extra AND

However, Option 1 is easier to maintain manually (when I need to update and insert new subcategories etc. and it is more pleasant to the eye in my opinion.

Any thoughts about it? Does Option 2 worth the trouble in terms of efficiency? Is there any design patters related with this common issue?

Answer

slartidan picture slartidan · Jul 25, 2015

I would use this structure:

ParentId, CategoryId, Title
null, 1, Home
null, 2, Business
null, 3, Hobbies
1, 4, Gardening
1, 5, Kitchen
1, 6, ...
2, 7, Development
2, 8, Marketing
2, 9, ...
3, 10, Soccer
3, 11, Reading
3, 12, ...

In detail:

  • only use one table, which references itself, so that you can have unlimited depth of categories
  • use technical ids (using IDENTITY, or similar), so that you can have more than 10 subcategories
  • if required add a human readable column for category-numbers as separate field

As long as you are only using two levels of categories you can still select like this:

SELECT BizTitle FROM tblBiz WHERE ParentId = 3 AND CategoryId = 11

The new hierarchyid feature of SQL server also looks quite promising: https://msdn.microsoft.com/en-us/library/bb677173.aspx


What I don't like about the Nested Set Model:

  • Inserting and deleting items in the Nested Set Model is a quite comlicated thing and requires expensive locks.
  • One can easily create inconsistencies which is prohibited, if you use the parent field in combination with a foreign key constraint.
    • Inconsistencies can appear, if rght is lower than lft
    • Inconsistencies can appear, if a value apprears in several rght or lft fields
    • Inconsistencies can appear, if you create gaps
    • Inconsistencies can appear, if you create overlaps
  • The Nested Set Model is in my opinion more complex and therefore not as easy to understand. This is absolutely subjective, of course.
  • The Nested Set Model requires two fields, instead of one - and so uses more disk space.