I have many tables that use Lookup/Enum references for most of their column values.
For example:
Person Table - PersonID | RaceCode | HairColorCode | HairStyleCode | TeethConditionCode
Location Table - LocationID | SizeCode | ExteriorColorCode | ConditionCode
Things like Race,Size,Color,Condition,etc would just be foreign key references to a Code lookup table. This code table has other fields but aren't important for my question. The database is for a SaaS application which means each client could have their own list of of Colors, Races, Conditions, etc. There are some codes that would be static which clients couldnt change.
Is it better to have 1 code table or 2 types of code tables (DynamicCodeTable for customer defined ones and StaticCodeTable for those that one change) or should I have a table for each code type (RaceCodeTable, HairColorTable, Condition, etc) ?
The thing I am the most worried about is all the sql joins. The Person table I am working with has 20+ of these code attributes. Is there a difference in performance when joining to 20 different tables VS joining to the same table 20 times? Having multiple tables means each table would be smaller and the lookup 'should' take less time. But having a single table could be quick too. Any suggestions?
This topic has been discussed at length over the last fifteen years, under the subject "One True Lookup Table" (abbreviated OTLT). The advantages of such an approach leap out to the database newbie. The drawbacks emerge over time. See these links for OTLT drawbacks:
Or search for OTLT
to find more discussions.
If you create many lookup tables, and many maintenance screens for them, you can create a view that simulates the OTLT by creating a gigantic UNION that includes every code, every description, and the name of the table where the code-description pair is stored. It's possible to generate such a union using semiautomatic methods, if you know what you're doing. I would imagine that semiautomatic methods would enable you to build a single maintenance screen for hundreds of lookup tables, and then put some logic between that screen and the tables that would insert a new code in the correct table.
As to letting the users introduce new code TYPES, and not just new code VALUES, that opens a whole big can of worms. See the above article discussing EAV. This is very seductive, because it allows the users to design their own underlying data structure. If you disregard performance, this works pretty well for a while. You get a perfectly general database without having to learn the data structure from the users or the subject matter experts.
When it runs into real grief is when you try to use the data as if it were an integrated database, and not just a hodge podge of disjointed opinions about the data. At this point, you're into some serious data archaeology, when your customers expect routine report generation. Good luck.
(Editted to change "data mining" to "data archaeology")