CP363 : Second Normal Form

Applies only to primary keys with multiple attributes.

Based upon full functional dependency. A dependency is full functional if when an attribute is removed from a key, the dependency is destroyed. If the dependency remains, this implies the key is not minimal.

i.e. a relation is 2NF if every non-key attribute of a relation is fully functionally dependent on the primary key of the relation. If an attribute is dependent only on a partial key, it is not 2NF. (You cannot have a partial key if the key consists of only a single attribute.)

A prime attribute is a member of a candidate key. An attribute is non-prime if it is not a member of an candidate key. (Non-prime attributes are also referred to as non-key attributes).

2NF Violation:

ID Course Surname Forename Instructor Grade
8245 CP102 Kumar Vijay N. Pavlova B-
8245 HP202 Kumar Vijay S. Zeller C+
9995 CP363 Soult Francine D. Brown A-
9995 CP400 Soult Francine T. Yang B+
9995 CP102 Soult Francine N. Pavlova A

This relation is ripe for generating anomalies. Whenever a student is added to a course the students name as well as ID must be added to the relation. If it is added incorrectly then we have one ID with multiple names attached to it. Similar things can happen with the combination of instructor and course. When a tuple is deleted from the relation, if the last student for that course is removed then the course is entirely removed from the database unless we make a point of storing null data for the removed student. This is awkward and ugly. Common sense, and normalization techniques, tell us there must be a better way.

Note: null values for the grade before the final grade is assigned are perfectly valid.

We have defined {Student_ID,Course} as the primary key for this table, and indeed as a key it uniquely identifies each tuple. However, the tuples above violate 2nd Normal Form. The non-key attributes Last_Name and First_Name are only partially dependent on the primary key since the attribute Course can be removed from the primary key, and the dependency still holds. i.e. Last_Name and First_Name are fully functionally dependent only on {Student_ID}. Similarly, the non-key attribute Instructor can be identified by the partial key {Course}. Only the Grade attribute is fully functionally dependent on the entire primary key.

The relation’s functional dependencies are:

Surname, Forename, and Instructor are functionally dependent upon only partial keys and thus 2NF is violated.

These functional dependencies arise out of the semantics of the data in the relation. We know that students have names as well as numbers and that although a name may be repeated (David Brown is fairly popular, for example), student IDs are never repeated. If we do not understand the semantics of the data we cannot determine the function dependencies.

This violation of 2nd Normal Form can be fixed by decomposition into relations where all attributes are fully functionally determined. In this particular example, this table could be broken up into three tables:

2NF Violation - Fixed

The original relation is decomposed into:

ID Surname Forename
8245 Kumar Vijay
9995 Soult Francine

Course Instructor
CP102 N. Pavlova
HP202 S. Zeller
CP363 D. Brown
CP400 T. Yang

ID Course Grade
8245 CP102 B-
8245 HP202 C+
9995 CP363 A-
9995 CP400 B+
9995 CP102 A

Note here the huge potential for reducing data redundancy. Now each combination of course and instructor is given once rather than once for every student in the course. Similarly a student's name appears only once.

The opportunity for data corruption is also greatly reduced. Now a student's name appears in one place only. An instructor's name can still appear against many courses, but that could be fixed by giving each instructor an ID number and storing their name separately, as with students.