Normalization of Database

Normalization is a systematic approach of decomposing tables to

  • minimize data redundancy and

  • minimize undesirable characteristics like Insertion, Update and Deletion Anomalies

Lets start with an example:

Here we can see that Student Info (his rollno and name) and Branch Info (branch-name, hod, office-tel) are unnecessarily put in same table. Wherever in a row we mention branch CSE, we have to put hod to Mr. X and office-tel to 53337 as well.

Here we can see that two independent table are unnecessarily mingled into one causing data redundancy.

Now lets understand the anomalies part.

  • Insertion Anomaly: whenever we enter a new student entry (rollno and name), we have two options:

    • We put NULL for (branch-name, hod and office-tel) whenever we do not have student branch's info while we are writing to table. This is not an appropriate information about the student's branch.

    • We make it compulsory that we have student branch's info whenever we write to this table. Having such compulsion is not good.

  • Update Anomaly: Suppose we have to update the Hod for CSE branch (Mr.X -> Mr.Y). We have to update all the rows where student belongs to CSE branch. You can see the problem there.

  • Deletion Anomaly: Suppose there is an entry of a student who belongs to ECE branch. There's only one such entry (No other student with ECE branch). Now if we remove that student from list, we also lose information about ECE branch in the process. SO bad :(

Normalization rules are divided into following normal form:

  • First Normal Form

  • Second Normal Form

  • Third Normal Form

  • BCNF

    First Normal Form

    • each column must contain atomic values (i.e no multiple things in one cell).

    • a column should contain values of same type

    • each column's name should be unique

    • order in which data is saved doesn't matter

    Second Normal Form

    • table must satisfy first-normal-form rules

    • there must not be any partial dependency of any column on primary key.

      • It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second normal form.

    • it helps in reducing Update Anomalies

      Note: primary key can be a group of columns.

    Third Normal Form

    • table must satisfy second-normal-form rules

    • it requires that every non-prime attribute of table must be dependent on primary key, or we can say that, there should not be the case that a non-prime attribute is determined by another non-prime attribute. So this transitive functional dependency should be removed from the table.

    • For example, consider a table with following fields.

      In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.

      The advantage of removing transitive dependency is:

      • Amount of data duplication is reduced.

      • Data integrity achieved.


Last updated