Having identified various normal forms, are there simplified rules of thumb that we follow for developing normalized database tables from various functional dependencies (FDs)? Since normalization has to do with controlling (i.e. minimizing) redundancy, we can ask questions about normalization in terms of whether we have minimized redundancy or not.
Using the following FDs as an example:
we want to create a series of database tables and foreign key relationships that preserve these FDs and follow the various normal forms taught.
We have to understand the real-world meaning of these FDs, or we can't translate them into tables. Starting with the first FD, we must understand that it means:
This does not mean that surnames or addresses are unique, just that for any
given Student_ID
, there is only one value of Surname
assigned to it.
We've got to start our table design somewhere, so we can arbitrarily start with this FD to give us:
«table» Student |
---|
Student_ID «PK» Surname Forename Address |
We have chosen Student_ID
to be our primary key because we know
that in the 'real world', the student ID is unique and simple, and thus is
the best candidate for a primary key for this table. Our first rule of thumb
is:
All fields listed in the FDs must end up in a table. The fields Surname
,
Forename
, and Address
appear nowhere else in the
FDs, so they must be associated with the Student_ID
and nothing
else. Further, these fields appear only on the right side of a FD; i.e. they
depend on a field, but no field depends on them. Since they depend only on Student_ID
,
they must be put into a table with Student_ID
- there is no
where else for them to go. Thus our second and third rules of thumb:
Next, we'll arbitrarily look at two of the multi-determinate FDs that both
involve Student_ID
. We must understand that they mean::
We may be tempted to expand our student table to include the Course_ID
field thusly:
«table» Student |
---|
Student_ID «PK» Surname Forename Address Course_ID |
However, this isn't going to work because the student ID is a primary key and thus may appear only once, which implies a student may take only one course, which violates the FD. A solution may appear to be:
«table» Student |
---|
Student_ID «PK» Course_ID «PK» Surname Forename Address |
and indeed, that way we can have many students with many courses, and each combination of student ID and course ID may appear only once.
However, there is a problem with this table. Every time we combine a student with a course by entering in a student ID and the matching course ID, we also have to enter the student's name and address. This screams "Redundancy!". We have to ask ourselves, "Does it make sense to re-enter this information?" The answer in this case is of course, "No". The fact that we've already determined that the student name is functionally dependent only on the student ID should be a big clue.
We still require the combination of student ID and course ID, but without the extra student information. The resulting table is:
«table» Student_Course |
---|
Student_ID «PK» Course_ID «PK» |
which fits our next rule of thumb:
Since we are looking at course ID, we look at another FD:
and a possible solution would appear to be:
«table» Student_Course |
---|
Student_ID «PK» Course_ID «PK» Course_Name |
However, this implies that every time we add a student to a course, we have to re-enter the course name. Again, this screams "Redundancy!". As well, the course name is not in any way dependent upon a student ID, so we don't require the student ID to be part of the primary key. A better solution is to put the course information into its own table:
«table» Course |
---|
Course_ID «PK» Course_Name |
Leaving our combination of Student and Course information as:
«table» Student_Course |
---|
Student_ID «PK» Course_ID «PK» |
This is a lookup table that contains the IDs of two other tables, leading to the rule of thumb:
What about the FD:
Thus we could add the instructor ID to the course table:
«table» Course |
---|
Course_ID «PK» Course_Name Instructor_ID |
But what about the primary key? Should it now be base upon course name instead of course ID?
No: we have to recognize that we have a transitive FD:
and leaving the course ID as the primary key is perfectly fine. Don't let an FD that has redundant information fool you.
What to do with:
in that case? Should we then end up with:
«table» Course |
---|
Course_ID «PK» Course_Name Instructor_ID Instructor_Name Dept_Name Dept_ID Faculty_Name Faculty_ID |
No: every time we add a new course we need to add the Instructor's name, Department information, and Faculty information again, which is clearly redundant!
We also have other transitive FDs because of:
we can extract the FD:
because we can drop the Dept_Name
field from the longer FD. We
still need the Dept_Name
field somewhere, and because it is on
the right side of an FD it must appear only once, we end up with:
«table» Dept |
---|
Dept_ID «PK» Dept_Name |
However, this loses the information about which Faculty the Department belongs to, so we need to add that to this table:
«table» Dept |
---|
Dept_ID «PK» Dept_Name Faculty_ID |
which implies the existence of a Faculty table:
«table» Faculty |
---|
Faculty_ID «PK» Faculty_Name |
Strictly speaking we are making a bit of a leap here, since there is no explicit FD giving the relationship between the Faculty ID and the Faculty Name, but given the naming we can safely make this assumption.
This leaves only the Instructor information. We know the following about the Instructor:
so we can rewrite these (through transitive rules) as:
The implication here is that we need separate Course and Instructor tables since an Instructor may be associated with multiple Courses, but we don't want to repeat the Instructor name many times. We end up with our first Course table:
«table» Course |
---|
Course_ID «PK» Course_Name Instructor_ID |
and a new Instructor table:
«table» Instructor |
---|
Instructor_ID «PK» Instructor_Name |
We still haven't completed the FD:
which, as we've seen, can be re-written:
so we can fulfil these requirements, without redundancy, by simply adding the Department ID to the Instructor table:
«table» Instructor |
---|
Instructor_ID «PK» Instructor_Name Dept_ID |
These tables should now cover all of the original FDs with minimal redundancy.
These are the final tables derived from our original FDs with foreign keys («FK») marked:
«table» Student |
---|
Student_ID «PK» Surname Forename Address |
«table» Student_Course |
---|
Student_ID «PK» «FK» Course_ID «PK» «FK» |
«table» Course |
---|
Course_ID «PK» Course_Name Instructor_ID «FK» |
«table» Instructor |
---|
Instructor_ID «PK» Instructor_Name Dept_ID «FK» |
«table» Dept |
---|
Dept_ID «PK» Dept_Name Faculty_ID «FK» |
«table» Faculty |
---|
Faculty_ID «PK» Faculty_Name |