DB Design: 1st Normal Form and Repeating Groups

JoeCool picture JoeCool · Jun 4, 2009 · Viewed 12.3k times · Source

To adhere to 1st normal form, one of the things you must avoid is repeating groups. As in instead of:

    CustID  Name  Address       Phone1      Phone2       Phone3

     102    Jerry  234 East..   555-2342   555-9854     555-2986

You should create a second Telephone Number table and then on a join you would get:

CustID  Name     Address       Phone

102 Jerry    234 East..   555-2342
102 Jerry    234 East..   555-9854
102 Jerry    234 East..   555-2986

Sometimes, it's a little more ambiguous and it's hard to tell when a group of column headers qualify. For instance, let's say you have, at the moment, two tests you run on every piece of hardware. And your first DB design yields the most horizontal approach:

Design 1

SN     Test1_Max   Test1_Min    Test1_Mean  Test2_Max   Test2_Min    Test2_Mean
2093      23          2            15         54          -24           45  

Obviously, this is a repeating group, that could much more easily be represented as (on a join between "Parts" and "Tests"):

Design 2

SN     Test      Max    Min    Mean     
2093    1        23     2      15       
2093    2        54     -24     45      

However, you could go even more vertical:

Design 3

SN     Test    Statistic    Value
2093    1        Max          23
2093    1        Min          2
2093    1        Mean         15       
2093    2        Max          54
2093    2        Min         -24
2093    2        Mean         45  

Is Design 3 necessary? How do you decide how vertical to make it? What are the pros and cons between Design 2 and 3? It seems that both could be selected or joined easily with SQL, with the advantage given to Design 3 because you could easily add a new Statistic without actually modifying the table structure.

But before anyone goes and says that the more vertical the better, there are times where it's more ambiguous. Like:

Design 4

SN      AverageCurrent (mA)    BatteryCapacity (mA)  
2093          200                    540  

Could instead be:

Design 5

SN      mA_Measuremnt       Value
2093    AverageCurrent      200 
2093    BatteryCapacity     540 

While both attributes are of the same domain (mA), they represent very different things in regards to the component. In this case, is Design 4 better since it's not strictly a repeating group? I guess what I'm looking for is some criteria to knowing when to break it down into more tables and thus make it more vertical.

To sum up this ridiculously long question, should you only remove and normalize repeating groups if they are exacly the same domain and have the exact same meaning?. If that is the case, then really only the telephone example and probably the two tests in Design 1 meet this criteria. Though it seems like there might be design benefits to Design 3 and 5, even though the statistics of Design 3 have different meanings strictly speaking, and AverageCurrent and BatteryCapacity definitely have different meanings in Design 5.

Answer

Matt Rogish picture Matt Rogish · Jun 4, 2009

Design 2 and Design 4 are the best ways to go provided the results will not always be present (aka NULLs in Desigin 1). If they always are taken, then the first design is fine.

I believe repeating groups in SQL would actually be if you have a column stuffed with add'l values e.g. Phone_Number contains "123-444-4444,123-333-3334" etc.

Anyway, the later designs are suboptimal -- you continue to take that to the final level and have the "One True Lookup Table" http://www.dbazine.com/ofinterest/oi-articles/celko22 or Entity Attribute Value http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

Either way, it's almost always a bad thing. Although they may share a common datatype/domain, the meaning differs -- thus they should remain individual attributes (maxtemp, mintemp, etc.)