What is atomicity in dbms

Prashant2329 picture Prashant2329 · Jun 4, 2014 · Viewed 20.2k times · Source

I read something like below in 1NF form of DBMS.

There was a sentence as follows:

"Every column should be atomic."

Can anyone please explain it to me thoroughly with an example?

Answer

Frazz picture Frazz · Jun 4, 2014

Atomicity and 1NF... that is not about atomic transactions, but about definition and column content.

"Atomic" means "cannot be divided or split in smaller parts". Applied to 1NF this means that a column should not contain more than one value. It should not compose or combine values that have a meaning of their own.

This tipically regards 2 very common mistakes made by database designers:

1. multiple values in one column (list columns)

columns that contain a list of values, tipically space or comma separated, like this blog post table:

id title     date_posted content tags
1  new idea  2014-05-23  ...     tag1,tag2,tag3
2  why this? 2014-05-24  ...     tag2,tag5
3  towel day 2014-05-26  ...     tag42

or this contacts table:

id room phones
4  432  111-111-111 222-222-222 
5  456  999-999-999
6  512  888-888-8888 333-3333-3333

This type of denormalization is rare, as most database designers see this cannot be a good thing. But you do find tables like this. They usually come from modifications to the database, whereas it may seem simpler to widen a column and use it to stuff multiple values instead of adding a normalized related table (which often breaks existing applications).

2. complex multi-part columns

In this case one column contains different bits of information and could maybe be designed as a set of separate columns.

Typical example are fullname and address columns:

id fullname              address
1  Mark Tomers           56 Tomato Road
2  Fred Askalong         3277 Hadley Drive
3  May Anne Brice        225 Century Avenue - apartment 43/a

These types of denormalizations are very common, as it is quite difficult to draw the line and what is atomic and what is not. Depending on the application, a multi-part column could very well be the best solution in some cases. It is less structured, but simpler.

Structuring an address in many atomic columns may mean having more complex code to handle results for output. Another complexity comes from the structure not being adeguate to fit all types of addresses. Using one single VARCHAR column does not pose this problem, but may pose others... typically about searching and sorting.

An extreme case of multi-part columns are dates and times. Most RDBMS provide date and time data types and provide functions to handle date and time algebra and the extraction of the various bits (month, hour, etc...). Few people would consider convenient to have separate year, mont, day columns in a relational database. But I've seen it... and with good reasons: the use case was birthdates for a justice department database. They had to handle many immigrants with few or no documents. Sometimes you just knew a person was born in a certain year, but you would not know the day or month or birth. You can't handle that type of info with a single date column.