# Normalization of Database

Normalization is a systematic approach of decomposing tables to

* minimize data redundancy and
* minimize undesirable characteristics like Insertion, Update and Deletion Anomalies

## Lets start with an example:

Here we can see that Student Info (his rollno and name) and Branch Info (branch-name, hod, office-tel) are unnecessarily put in same table. Wherever in a row we mention branch CSE, we have to put hod to Mr. X and office-tel to 53337 as well.

Here we can see that two independent table are unnecessarily mingled into one causing data redundancy.

## Now lets understand the anomalies part.

* **Insertion Anomaly:** whenever we enter a new student entry (rollno and name), we have two options:
  * We put NULL for (branch-name, hod and office-tel) whenever we do not have student branch's info while we are writing to table. This is not an appropriate information about the student's branch.
  * We make it compulsory that we have student branch's info whenever we write to this table. Having such compulsion is not good.
* **Update Anomaly:** Suppose we have to update the Hod for CSE branch (Mr.X -> Mr.Y). We have to update all the rows where student belongs to CSE branch. You can see the problem there.
* **Deletion Anomaly:** Suppose there is an entry of a student who belongs to ECE branch. There's only one such entry (No other student with ECE branch). Now if we remove that student from list, we also lose information about ECE branch in the process. SO bad :(

## Normalization rules are divided into following normal form:

* First Normal Form
* Second Normal Form
* Third Normal Form
* BCNF

  **First Normal Form**

  * each column must contain atomic values (i.e no multiple things in one cell).
  * a column should contain values of same type
  * each column's name should be unique
  * order in which data is saved doesn't matter

  **Second Normal Form**

  * table must satisfy first-normal-form rules
  * there must **not be any partial dependency** of any column on primary key.
    * It means that for a table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence. If any column depends only on one part of the concatenated key, then the table fails Second normal form.
  * it helps in reducing Update Anomalies

    > Note: primary key can be a group of columns.

  **Third Normal Form**

  * table must satisfy second-normal-form rules
  * it requires that every non-prime attribute of table must be dependent on primary key, or we can say that, there should not be the case that a non-prime attribute is determined by another non-prime attribute. So this transitive functional dependency should be removed from the table.
  * For example, consider a table with following fields.

    <img src="/files/-LvkghHulfrG-asvpAbp" alt="" data-size="original">

    In this table Student\_id is Primary key, but street, city and state depends upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.

    **The advantage of removing transitive dependency is:**

    * Amount of data duplication is reduced.
    * Data integrity achieved.

  **BCNF**

  <img src="/files/-LvkghHw5wsd5tkBsKRf" alt="" data-size="original">


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://blog.gomchik.com/tech/system-design/normalization-of-database.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
