2020-12-16 07:12

CP363 : Normalization

What is Normalization?

It is the process of putting relations in a database into a normal form.
More generally it is a design technique for structuring database relations.

Why Normalize?

To avoid update, insertion, and deletion anomalies.
To minimize redundancy.

Remember that these three anomalies can corrupt the contents of a database. (We sill see some examples momentarily.) By reducing data duplication and eliminating these anomalies, we intend to minimize the possiblility of data corruption and simplify the development, maintenance, and expandability of the database.

How Do We Normalize?

Analyze relation schemas based upon their functional dependencies and primary keys.
Relations that do not meet normal form tests must be decomposed into multiple relations that individually satisfy these tests.

It is important to remember that normalization techniques are based not only on algorithms, but on the semantics of the data in the database. If we do not understand the data domains and relationships we cannot perform proper normalizations.

A normalized schema should have the properties:

lossless joins and non-additive joins: joins of decomposed relations reproduce those relations exactly. No loss of tuples, and no generation of spurious tuples.
dependency preservation: all functional dependencies are preserved (although denormalization may be allowed for performance reasons).

Normal Forms

The following normal forms are listed from least to most normalized:

First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)

Normal Forms 1 through 3 are based upon primary keys. Every relation is assumed to have a primary key made up of one or more attributes.

Reminder: A superkey is a set of attributes that uniquely identify a tuple in a relation. A key is a minimal superkey, i.e. a superkey with superfluous attributes removed. Possible keys are called candidate keys. A key chosen from a set of candidates is a primary key.