How to understand the 5th Normal Form?

Harry picture Harry · Aug 3, 2013 · Viewed 13.1k times · Source

I'm using two online sources for gaining an understanding of the 5NF, without any rigor of Math and proofs.

  1. A Simple Guide to Five Normal Forms in Relational Database Theory (by Kent. This one seems to have been reviewed and endorsed in one of his writings by none other than CJ Date himself)
  2. Fifth Normal Form (Wikipedia article)

However, I'm unable to understand either of these references!

Let's first examine Reference #1 (Kent's).

It says: "But suppose that a certain rule was in effect: if an agent sells a certain product, and he represents a company making that product, then he sells that product for that company."

and, then, goes on to break up the original table (all table names have been given by me)...

acp(agent, company, product)

-----------------------------
| AGENT | COMPANY | PRODUCT |
|-------+---------+---------|
| Smith | Ford    | car     | 
| Smith | Ford    | truck   | 
| Smith | GM      | car     | 
| Smith | GM      | truck   | 
| Jones | Ford    | car     | 
-----------------------------

... into 3 tables:

ac(agent, company)
cp(company, product)
ap(agent, product)

-------------------   ---------------------   ------------------- 
| AGENT | COMPANY |   | COMPANY | PRODUCT |   | AGENT | PRODUCT |
|-------+---------|   |---------+---------|   |-------+---------|
| Smith | Ford    |   | Ford    | car     |   | Smith | car     |
| Smith | GM      |   | Ford    | truck   |   | Smith | truck   |
| Jones | Ford    |   | GM      | car     |   | Jones | car     |
-------------------   | GM      | truck   |   -------------------
                      ---------------------

But I'm not even sure if I understand the English-language meaning of the above rule. My understanding of the above rule is that its 'then' clause is totally redundant! For,

IF an agent is selling a product

AND

IF this agent is representing a company making that product,

THEN, OBVIOUSLY, this agent is selling that product for that company.

So, where is the 'rule' in this statement? It, in fact, seems to be a non-statement to me!

Working backwards from the three tables -- ac, cp, and ap -- it seems the rule really is: "A company may make 1 or more products, an agent may represent 1 or more companies, and when representing a company he may or may not sell all its products."

But the original table acp was already capturing this rule. So, I'm not sure what is going on here with the explanation of 5NF.

Let's now examine Reference #2 (Wikipedia).

It says: Suppose, however, that the following rule applies: "A Traveling Salesman has certain Brands and certain Product Types in his repertoire. If Brand B1 and Brand B2 are in his repertoire, and Product Type P is in his repertoire, then (assuming Brand B1 and Brand B2 both make Product Type P), the Traveling Salesman must offer products of Product Type P those made by Brand B1 and those made by Brand B2."

Once again, going just by the English-language meaning of this rule and nothing else,

IF a salesman has brands B1 and B2, and product P with him,

AND

IF product P is made by both brands B1 and B2,

THEN, why on earth wouldn't he be able to offer product P of brands B1 and B2 just as he could in the original 3-column table 'sbp(salesman, brand, product)' which was serving well even before this new 'rule' came into effect?

Could somebody please clarify?

Answer

Damir Sudarevic picture Damir Sudarevic · Aug 3, 2013

See, it is much easier to understand the thing backwards.

First the 5NF; a table (relational variable) is in the 5NF if decomposing it would not remove any redundancies. So, it is final NF as far as removing redundancy is concerned.

The original table obviously has some redundancy. It claims that "Smith represents Ford." twice, and "Smith represents GM." twice.

So let's see is it possible to decompose this into two or more projections and reduce some redundancy.

Let's start backwards.

  • Company exists. {COMPANY}

  • Agent exists. {AGENT}

  • Product exists. {PRODUCT}

  • Company makes Product. {COMPANY, PRODUCT}

  • Agent represents Company. {AGENT, COMPANY}

A pause here; suppose a rule was "If an agent represents a company, and the company makes a product, then the agent sells that product".

This would be simply {AGENT, COMPANY} JOIN {COMPANY, PRODUCT} ; but this would generate an extra tuple, namely (Jones, Ford, truck); which is not true because Jones does not sell trucks.

So, not every agent sells every product, hence it is necessary to state that explicitly.

  • Agent sells Product. {AGENT, PRODUCT}

Now if we join

{AGENT, COMPANY} JOIN {COMPANY, PRODUCT} JOIN {AGENT, PRODUCT}

that extra tuple is eliminated by the join to the {AGENT, PRODUCT}.

To grasp things intuitively, the rule can be modified a bit.

Original

If an agent sells a certain product, and he represents a company making that product, then he sells that product for that company.

Modified (same meaning)

If an agent sells product, and agent represents company, and the company makes that product, then agent sells that product for that company.

Explained (substitute from bullet points above)

If {AGENT, PRODUCT} and {AGENT, COMPANY} and {COMPANY, PRODUCT} then {AGENT, COMPANY, PRODUCT}.

So, the rule allows for the join to happen -- and hence the decomposition.

Now compare that to the predicate of the original table:

Agent represents a Company and sells some Product that the company makes.

Not the same as the rule, so it is open to anomalies which would violate the rule -- see Bill Karwin's example.


EDIT (see comments below)

Suppose that we have the original table, but not the rule.

It is obvious that there is some redundancy in the table, so we may wonder if there is a way to remove that redundancy somehow -- usual way is decomposition into projections of the table.

So, after some tinkering, we figure out that it can be decomposed into {AGENT, PRODUCT}, {AGENT, COMPANY}, {COMPANY, PRODUCT}. Current data certainly allows for that -- as per your example.

And we do that, and whenever interested in "Which agent sells which product from which company?" the answer is simply

{AGENT, COMPANY} JOIN {COMPANY, PRODUCT} JOIN {AGENT, PRODUCT}

Then Honda shows up, and they make cars and trucks too. Well, no problem there, just insert (Honda, truck) , (Honda, car) into {COMPANY, PRODUCT}.

Then Smith decides to sell Honda cars, but not trucks. Sorry, no way, oops! Because he already sells cars and trucks, if he wants to represent Honda, he has to sell both.

Because we would have tuples

(Smith, Honda) (Honda, truck) (Smith, truck)
               (Honda, car)   (Smith, car)

So we have introduced the rule! Really did not want to -- was just trying to get rid of some redundancy.

The question is now, was the original dataset just a fluke, or was it a result of a rule which was enforced somehow outside of the DB?

The author (Kent) claims that the rule exists and the design does not match it. Certainly, it would not be a problem for the original table to accept (Smith, Honda, car) only -- not requiring (Smith, Honda, truck).


Theoretical point (ignore if boring)

The rule

If {AGENT, PRODUCT} and {AGENT, COMPANY} and {COMPANY, PRODUCT} then {AGENT, COMPANY, PRODUCT}; for every (Agent, Company, Product) triplet.

explicitly states that join dependency

* { {AGENT, COMPANY}, {COMPANY, PRODUCT}, {AGENT, PRODUCT} }

holds for the original table.


As often stated, cases like this are rare; actually so rare that even textbook examples have to introduce weird rules in order to explain the basic idea.


EDIT II (the fun part, but may help understanding)

Suppose that the rule does not exist, and there is explicit requirement that any agent can sell what ever he wants from any company -- hence the rule would be plain wrong.

In that case we have the original table

{AGENT, COMPANY, PRODUCT}

I would argue that:

  1. Being all-key, it is in BCNF.

  2. It can not be decomposed (current data may allow it, but future does not).

  3. It is in BCNF, all key, it can not be decomposed, hence it is in 5NF.

  4. It is in 5NF and is all-key, hence it is in 6NF.

So, it is the presence or non-existence of the rule that determines if the table is in BCNF or 6NF -- same table same data.