What is a good KISS description of Boyce-Codd normal form?

Paul Nathan picture Paul Nathan · Feb 12, 2009 · Viewed 12.3k times · Source

What is a KISS (Keep it Simple, Stupid) way to remember what Boyce-Codd normal form is and how to take a unnormalized table and BCNF it?

Wikipedia's info: not terribly helpful for me.

Answer

Dour High Arch picture Dour High Arch · Feb 12, 2009

Chris Date's definition is actually quite good, so long as you understand what he means:

Each attribute

Your data must be broken into separate, distinct attributes/columns/values which do not depend on any other attributes. Your full name is an attribute. Your birthdate is an attribute. Your age is not an attribute, it depends on the current date which is not part of your birthdate.

must represent a fact

Each attribute is a single fact, not a collection of facts. Changing one bit in an attribute changes the whole meaning. Your birthdate is a fact. Is your full name a fact? Well, in some cases it is, because if you change your surname your full name is different, right? But to a genealogist you have a surname and a family name, and if you change your surname your family name does not change, so they are separate facts.

about the key,

One attribute is special, it's a key. The key is an attribute that must be unique for all information in your data and must never change. Your full name is not a key because it can change. Your Social Insurance Number is not a key because they get reused. Your SSN plus birthdate is not a key, even if the combination can never be reused, because an attribute cannot be a combination of two facts. A GUID is a key. A number you increment and never reuse is a key.

the whole key,

The key alone must be sufficient [and necessary!] to identify your values; you cannot have the same data represented by different keys, nor can a subset of the key columns be sufficient to identify the fact. Suppose you had an address book with a GUID key, name and address values. It is OK to have the same name appearing twice with different keys if they represent different people and are not the "same data". If Mary Jones in accounting changes her name to Mary Smith, Mary Jones in Sales does not change her name as well. On the other hand, if Mary Smith and John Smith have the same street address and it really is the same place, this is not allowed. You have to create a new key/value pair with the street address and a new key.

You are also not allowed to use the key for this new single street address as a value in the address book since now the same street address key would be represented twice. Instead, you have to make a third key/value pair with values of the address book key and the street address key; you find a person's street address by matching their book key and address key in this group of values.

and nothing but the key

There must be nothing other than the key that identifies your values. For example, if you are allowed an address of "The Taj Mahal" (assuming there is only one) you are not allowed a city value in the same record, since if you know the address you would also know the city. This would also open up the possibility of there being more than one Taj Mahal in a different city. Instead, you have to again create a secondary Location key with unique values like the Taj, the White House in DC, and so on, and their cities. Or forbid "addresses" that are unique to a city.

So help me, Codd.