What are database normal forms and can you give examples?

barfoon picture barfoon · Apr 7, 2009 · Viewed 246.7k times · Source

In relational database design, there is a concept of database normalization or simply normalization, which is a process of organizing columns (attributes) and tables (relations) to reduce data redundancy and improve data integrity. (as written on Wikipedia).

As most articles are somewhat technical and thus harder to understand, I'm asking for someone to write an easier to understand explanation based on examples about what 1NF, 2NF, 3NF, even 3.5NF (Boyce-Codd) mean.

Answer

Smashery picture Smashery · Apr 7, 2009

1NF is the most basic of normal forms - each cell in a table must contain only one piece of information, and there can be no duplicate rows.

2NF and 3NF are all about being dependent on the primary key. Recall that a primary key can be made up of multiple columns. As Chris said in his response:

The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF] (so help me Codd).

2NF

Say you have a table containing courses that are taken in a certain semester, and you have the following data:

|-----Primary Key----|               uh oh |
                                           V
CourseID | SemesterID | #Places  | Course Name  |
------------------------------------------------|
IT101    |   2009-1   | 100      | Programming  |
IT101    |   2009-2   | 100      | Programming  |
IT102    |   2009-1   | 200      | Databases    |
IT102    |   2010-1   | 150      | Databases    |
IT103    |   2009-2   | 120      | Web Design   |

This is not in 2NF, because the fourth column does not rely upon the entire key - but only a part of it. The course name is dependent on the Course's ID, but has nothing to do with which semester it's taken in. Thus, as you can see, we have duplicate information - several rows telling us that IT101 is programming, and IT102 is Databases. So we fix that by moving the course name into another table, where CourseID is the ENTIRE key.

Primary Key |

CourseID    |  Course Name |
---------------------------|
IT101       | Programming  |
IT102       | Databases    |
IT103       | Web Design   |

No redundancy!

3NF

Okay, so let's say we also add the name of the teacher of the course, and some details about them, into the RDBMS:

|-----Primary Key----|                           uh oh |
                                                       V
Course  |  Semester  |  #Places   |  TeacherID  | TeacherName  |
---------------------------------------------------------------|
IT101   |   2009-1   |  100       |  332        |  Mr Jones    |
IT101   |   2009-2   |  100       |  332        |  Mr Jones    |
IT102   |   2009-1   |  200       |  495        |  Mr Bentley  |
IT102   |   2010-1   |  150       |  332        |  Mr Jones    |
IT103   |   2009-2   |  120       |  242        |  Mrs Smith   |

Now hopefully it should be obvious that TeacherName is dependent on TeacherID - so this is not in 3NF. To fix this, we do much the same as we did in 2NF - take the TeacherName field out of this table, and put it in its own, which has TeacherID as the key.

 Primary Key |

 TeacherID   | TeacherName  |
 ---------------------------|
 332         |  Mr Jones    |
 495         |  Mr Bentley  |
 242         |  Mrs Smith   |

No redundancy!!

One important thing to remember is that if something is not in 1NF, it is not in 2NF or 3NF either. So each additional Normal Form requires everything that the lower normal forms had, plus some extra conditions, which must all be fulfilled.