Denormalization is generally used to either:
- Avoid a certain number of queries
- Remove some joins
The basic idea of denormalization is that you'll add redundant data, or group some, to be able to get those data more easily -- at a smaller cost; which is better for performances.
A quick examples?
- Consider a "Posts" and a "Comments" table, for a blog
- For each Post, you'll have several lines in the "Comment" table
- This means that to display a list of posts with the associated number of comments, you'll have to:
- Do one query to list the posts
- Do one query per post to count how many comments it has (Yes, those can be merged into only one, to get the number for all posts at once)
- Which means several queries.
- Now, if you add a "number of comments" field into the Posts table:
- You only need one query to list the posts
- And no need to query the Comments table: the number of comments are already de-normalized to the Posts table.
- And only one query that returns one more field is better than more queries.
Now, there are some costs, yes:
- First, this costs some place on both disk and in memory, as you have some redundant informations:
- The number of comments are stored in the Posts table
- And you can also find those number counting on the Comments table
- Second, each time someone adds/removes a comment, you have to:
- Save/delete the comment, of course
- But also, update the corresponding number in the Posts table.
- But, if your blog has a lot more people reading than writing comments, this is probably not so bad.