From the Midterm:
These examples were deliberately chosen since you have experience with university organization, simplified though this is.
The goals to keep in mind when turning these dependencies into tables are:
The order in which you process these dependencies is arbitrary. In this example we will start with:
Instructor_ID → {Dept_Name, Dept_ID} → {Faculty_Name, Faculty_ID}
These dependencies imply that each instructor belongs to one department, and each department belongs to one faculty. Since faculty information appears only once in all of the given dependencies, it is a good start for a simple table. Given that both the Faculty_ID and Faculty_Name should be unique - experience tells you that university faculties are unique - either attribute could be chosen as the primary key attribute. However, we typically use ID fields as primary key fields, and thus end up with:
«table» Faculty |
---|
Faculty_ID «PK» Faculty_Name |
The midterm question did not require that you provide sample data, but we will show some for later use:
Faculty_ID | Faculty_Name |
---|---|
1 | Science |
2 | Arts |
The dependency
Instructor_ID → {Dept_Name, Dept_ID}
can be
broken down into:
Instructor_ID → Dept_Name
Instructor_ID → Dept_ID
which implies that:
{Dept_Name, Dept_ID} → {Faculty_Name,
Faculty_ID}
can be broken down into:
Dept_Name → Faculty_Name
Dept_Name → Faculty_ID
Dept_ID → Faculty_Name
Dept_ID → Faculty_ID
supported by the fact that we know departments at a university are unique, and belong to a single faculty.
We could be tempted to define a Department table like:
«table» Dept |
---|
Dept_ID «PK» Dept_Name Faculty_ID Faculty_Name |
Like the Faculty table we have to choose a primary key, and like the Faculty table we chose the ID attribute, in this case Dept_ID.
The problem here lies in the potential anomalies that can arise, as in this example:
Dept_ID | Dept_Name | Faculty_ID | Faculty_Name |
---|---|---|---|
1 | Computing | 1 | Science |
5 | English | 2 | Business |
Note the anomaly in the second row, where the Faculty_ID and the Faculty_Name do not match, according to the data given in the previous Faculty table examples. We want to relate a department to a faculty, but it is sufficient to do so with a single attribute that should be a foreign key into the Faculty table. Since the primary key of the Faculty table is the Faculty_ID, it is sufficient to use only the Faculty_ID as a reference to the Faculty table. Using the Faculty_Name field as well is only setting the table up for anomalies.
Thus the better design for the Dept table is:
«table» Dept |
---|
Dept_ID «PK» Dept_Name Faculty_ID |
and the sample data becomes:
Dept_ID | Dept_Name | Faculty_ID |
---|---|---|
1 | Computing | 1 |
5 | English | 2 |
There are no possibilities for anomalies in this design, no normal form is violated, and we have reduced redundancy by not repeating the Faculty name.
We also know that:
Instructor_ID → Instructor_Name
and have
figured out that:
Instructor_ID → Dept_ID
the implication
being, as already noted, that each instructor has a unique ID and belongs
to a unique Department. Experience tells us that an instructor name may
not be unique, and is thus a poor choice for a primary key.
We could be tempted to define an Instructor table like:
«table» Instructor |
---|
Instructor_ID «PK» Instructor_Name Dept_ID Dept_Name |
but like the example poor choice for a Department table above, this can lead to anomalies, such as:
Instructor_ID | Instructor_Name | Dept_ID | Dept_Name |
---|---|---|---|
14 | David Brown | 1 | Computing |
23 | Cranston Snord | 5 | History |
We know from the proper Dept table examples that Dept_ID 5 is English, not History, thus this demonstrates how we can have anomalous data in such a table.
The better design for this table is:
«table» Instructor |
---|
Instructor_ID «PK» Instructor_Name Dept_ID |
and our sample data is:
Instructor_ID | Instructor_Name | Dept_ID |
---|---|---|
14 | David Brown | 1 |
23 | Cranston Snord | 5 |
There are no possibilities for anomalies in this design, no normal form is violated, and we have reduced redundancy by not repeating the Department name.
We know that:
Course_ID → Course_Name
Course_Name →
Instructor_ID
Instructor_Name ↠ Course_ID
which tells us that each course has a single instructor, but each instructor may teach many courses.
Keep in mind that a given dependency does not tie us to a specific use of the given attributes in a table design, for example, the dependencies:
Instructor_ID → Instructor_Name
Instructor_Name ↠ Course_ID
tells us that the following is also true:
Instructor_ID ↠ Course_ID
meaning that we are not required to use the instructor name when relating
a course to an instructor - using the instructor ID is perfectly
acceptable, and given our preference for using ID fields as foreign key
fields is a better choice.
The dependency:
Course_ID ↠ Student_ID
adds another layer, so
we could be tempted to define a Course table like:
«table» Course |
---|
Course_ID «PK» Course_Name Instructor_ID Instructor_Name Student_ID |
This may lead to multiple anomalies:
Course_ID | Course_Name | Instructor_ID | Instructor_Name | Student_ID |
---|---|---|---|---|
8 | CP363 | 14 | Bob Smith | 934859483 |
8 | HI242 | 14 | David Brown | 217384732 |
22 | BW101 | 94 | Nakhat Fatima | 217384732 |
The course information must be repeated because if the course is matched against a student, the course information must be repeated for every student ID. Because of this, we have potential anomalies because of the repetition of the course name and the repetition of the instructor name. (Which Normal Forms are being violated here?)
Clearly a better design - the key, the whole key, and nothing but the
key
- is to isolate the course information and not repeat any
related data:
«table» Course |
---|
Course_ID «PK» Course_Name Instructor_ID |
with sample data:
Course_ID | Course_Name | Instructor_ID |
---|---|---|
8 | CP363 | 14 |
22 | BW101 | 94 |
With this design we have reduced redundancy and eliminated the possibility of anomalies.
Note that there are even worse designs - a Course table that includes student names as well as IDs would introduce even more possible anomalies.
The dependencies:
Student_ID ↠ Course_ID
Course_ID ↠
Student_ID
do not imply the existence of multiple tables, merely
that the relationship between a student and a course is commutative, i.e.
one student may take many courses, and one course may have many students.
Thus:
«table» Student_Course |
---|
Course_ID «PK» Student_ID «PK» |
with sample data:
Course_ID | Student_ID |
---|---|
8 | 934859483 |
8 | 217384732 |
22 | 217384732 |
This is a look-up table that relates a course to a student, and visa-versa. By having both of its fields as a primary key, there can be no possibility of a student taking a course multiple times, or of a course having a student registered in it multiple times.
Having any other fields (e.g. course name, student name, instructor information) in this table would clearly lead to unnecessary redundancy, and many possible anomalies, as in the previous examples.
This leaves us with the dependency:
Student_ID → {Surname, Forename,
Address}
which implies the table:
«table» Student |
---|
Student_ID «PK» Surname Forename Address |
No other attributes should be attached to this table because they would clearly lead to unnecessary redundancy, and many possible anomalies, as in the previous examples.
Whether you design these tables in the order shown here (Faculty to Student) or some other order (Student to Faculty), the results should be the same. Recognizing and removing the possibilities of anomalies is key to a good database design. Whether you do that by applying Normal Form rules, or just realizing the potential redundancies and errors doesn't matter, the results should be the same.