2020-12-16 07:12

CP363 : Normal Form Rules of Thumb

Having identified various normal forms, are there simplified rules of thumb that we follow for developing normalized database tables from various functional dependencies (FDs)? Since normalization has to do with controlling (i.e. minimizing) redundancy, we can ask questions about normalization in terms of whether we have minimized redundancy or not.

Using the following FDs as an example:

Student_ID → {Surname, Forename, Address}
Instructor_ID → {Dept_Name, Dept_ID} → {Faculty_Name, Faculty_ID}
Course_ID → Course_Name
Student_ID →→ Course_ID
Course_ID →→ Student_ID
Course_Name → Instructor_ID
Instructor_Name →→ Course_ID
Instructor_ID → Instructor_Name
Dept_ID → Dept_Name

we want to create a series of database tables and foreign key relationships that preserve these FDs and follow the various normal forms taught.

We have to understand the real-world meaning of these FDs, or we can't translate them into tables. Starting with the first FD, we must understand that it means:

Student_ID → {Surname, Forename, Address} : a Student ID has only one surname, forename, and address associated with it

This does not mean that surnames or addresses are unique, just that for any given Student_ID, there is only one value of Surname assigned to it.

We've got to start our table design somewhere, so we can arbitrarily start with this FD to give us:

«table» Student
Student_ID «PK» Surname Forename Address

We have chosen Student_ID to be our primary key because we know that in the 'real world', the student ID is unique and simple, and thus is the best candidate for a primary key for this table. Our first rule of thumb is:

Every table has a primary key, and the key is as simple as possible.

All fields listed in the FDs must end up in a table. The fields Surname, Forename, and Address appear nowhere else in the FDs, so they must be associated with the Student_ID and nothing else. Further, these fields appear only on the right side of a FD; i.e. they depend on a field, but no field depends on them. Since they depend only on Student_ID, they must be put into a table with Student_ID - there is no where else for them to go. Thus our second and third rules of thumb:

All fields in the FDs are in at least one table.

Fields that appear only on the right side of a FD should appear in only one table.

Next, we'll arbitrarily look at two of the multi-determinate FDs that both involve Student_ID. We must understand that they mean::

Student_ID →→ Course_ID : one Student make have many Courses
Course_ID →→ Student_ID : one Course may have many Students

We may be tempted to expand our student table to include the Course_ID field thusly:

«table» Student
Student_ID «PK» Surname Forename Address Course_ID

However, this isn't going to work because the student ID is a primary key and thus may appear only once, which implies a student may take only one course, which violates the FD. A solution may appear to be:

«table» Student
Student_ID «PK» Course_ID «PK» Surname Forename Address

and indeed, that way we can have many students with many courses, and each combination of student ID and course ID may appear only once.

However, there is a problem with this table. Every time we combine a student with a course by entering in a student ID and the matching course ID, we also have to enter the student's name and address. This screams "Redundancy!". We have to ask ourselves, "Does it make sense to re-enter this information?" The answer in this case is of course, "No". The fact that we've already determined that the student name is functionally dependent only on the student ID should be a big clue.

We still require the combination of student ID and course ID, but without the extra student information. The resulting table is:

«table» Student_Course
Student_ID «PK» Course_ID «PK»

which fits our next rule of thumb:

Rows in a table must not require redundant data.

Since we are looking at course ID, we look at another FD:

Course_ID → Course_Name

and a possible solution would appear to be:

«table» Student_Course
Student_ID «PK» Course_ID «PK» Course_Name

However, this implies that every time we add a student to a course, we have to re-enter the course name. Again, this screams "Redundancy!". As well, the course name is not in any way dependent upon a student ID, so we don't require the student ID to be part of the primary key. A better solution is to put the course information into its own table:

«table» Course
Course_ID «PK» Course_Name

Leaving our combination of Student and Course information as:

«table» Student_Course
Student_ID «PK» Course_ID «PK»

This is a lookup table that contains the IDs of two other tables, leading to the rule of thumb:

A many-to-many relationship requires a lookup table.

What about the FD:

Course_Name → Instructor_ID : each course name has an instructor ID

Thus we could add the instructor ID to the course table:

«table» Course
Course_ID «PK» Course_Name Instructor_ID

But what about the primary key? Should it now be base upon course name instead of course ID?

No: we have to recognize that we have a transitive FD:

Course_ID → Course_Name → Instructor_ID

and leaving the course ID as the primary key is perfectly fine. Don't let an FD that has redundant information fool you.

What to do with:

Instructor_ID → Instructor_Name : each instructor ID has an instructor name
Instructor_ID → {Dept_Name, Dept_ID} → {Faculty_Name, Faculty_ID} : every instructor ID has a department ID and department name, and every department ID and department name has a faculty ID and faculty name
Dept_ID → Dept_Name : every department ID has a department name

in that case? Should we then end up with:

«table» Course
Course_ID «PK» Course_Name Instructor_ID Instructor_Name Dept_Name Dept_ID Faculty_Name Faculty_ID

No: every time we add a new course we need to add the Instructor's name, Department information, and Faculty information again, which is clearly redundant!

We also have other transitive FDs because of:

Dept_ID → Dept_Name

we can extract the FD:

Instructor_ID → Dept_ID → {Faculty_Name, Faculty_ID}

because we can drop the Dept_Name field from the longer FD. We still need the Dept_Name field somewhere, and because it is on the right side of an FD it must appear only once, we end up with:

«table» Dept
Dept_ID «PK» Dept_Name

However, this loses the information about which Faculty the Department belongs to, so we need to add that to this table:

«table» Dept
Dept_ID «PK» Dept_Name Faculty_ID

which implies the existence of a Faculty table:

«table» Faculty
Faculty_ID «PK» Faculty_Name

Strictly speaking we are making a bit of a leap here, since there is no explicit FD giving the relationship between the Faculty ID and the Faculty Name, but given the naming we can safely make this assumption.

This leaves only the Instructor information. We know the following about the Instructor:

Course_ID → Course_Name
Course_Name → Instructor_ID
Instructor_Name →→ Course_ID
Instructor_ID → Instructor_Name

so we can rewrite these (through transitive rules) as:

Course_ID → Instructor_ID : every course has one instructor
Instructor_ID →→ Course_ID : every instructor may teach many courses
Instructor_ID → Instructor_Name : every instructor has one name

The implication here is that we need separate Course and Instructor tables since an Instructor may be associated with multiple Courses, but we don't want to repeat the Instructor name many times. We end up with our first Course table:

«table» Course
Course_ID «PK» Course_Name Instructor_ID

and a new Instructor table:

«table» Instructor
Instructor_ID «PK» Instructor_Name

We still haven't completed the FD:

Instructor_ID → {Dept_Name, Dept_ID} → {Faculty_Name, Faculty_ID}

which, as we've seen, can be re-written:

Instructor_ID → Dept_ID
Dept_ID → Dept_Name
Dept_ID → Faculty_ID
Faculty_ID → Faculty_Name

so we can fulfil these requirements, without redundancy, by simply adding the Department ID to the Instructor table:

«table» Instructor
Instructor_ID «PK» Instructor_Name Dept_ID

These tables should now cover all of the original FDs with minimal redundancy.

Final Tables

These are the final tables derived from our original FDs with foreign keys («FK») marked:

«table» Student
Student_ID «PK» Surname Forename Address

«table» Student_Course
Student_ID «PK» «FK» Course_ID «PK» «FK»

«table» Course
Course_ID «PK» Course_Name Instructor_ID «FK»

«table» Instructor
Instructor_ID «PK» Instructor_Name Dept_ID «FK»

«table» Dept
Dept_ID «PK» Dept_Name Faculty_ID «FK»

«table» Faculty
Faculty_ID «PK» Faculty_Name

Rules of Thumb

Every table has a primary key, and the key is as simple as possible.
All fields in the functional dependencies are in at least one table.
Fields that appear only on the right side of a FD should appear in only one table.
Rows in a table must not require redundant data.
A many-to-many relationship requires a lookup table.