CP363: Midterm Table Extraction Example

From the Midterm:

These examples were deliberately chosen since you have experience with university organization, simplified though this is.

The goals to keep in mind when turning these dependencies into tables are:

come up with a minimal cover set of dependencies
reduce redundancy
avoid anomalies
do not violate any normal forms
the attributes should be based upon the key, the whole key, and nothing but the key
use experience and common sense

The order in which you process these dependencies is arbitrary. In this example we will start with:

Instructor_ID → {Dept_Name, Dept_ID} → {Faculty_Name, Faculty_ID}

These dependencies imply that each instructor belongs to one department, and each department belongs to one faculty. Since faculty information appears only once in all of the given dependencies, it is a good start for a simple table. Given that both the Faculty_ID and Faculty_Name should be unique - experience tells you that university faculties are unique - either attribute could be chosen as the primary key attribute. However, we typically use ID fields as primary key fields, and thus end up with:

«table» Faculty
Faculty_ID «PK» Faculty_Name

The midterm question did not require that you provide sample data, but we will show some for later use:

**Faculty** Table
Faculty_ID	Faculty_Name
1	Science
2	Arts

The dependency
Instructor_ID → {Dept_Name, Dept_ID}
can be broken down into:

Instructor_ID → Dept_Name
Instructor_ID → Dept_ID

which implies that:
{Dept_Name, Dept_ID} → {Faculty_Name, Faculty_ID}
can be broken down into:

Dept_Name → Faculty_Name
Dept_Name → Faculty_ID
Dept_ID → Faculty_Name
Dept_ID → Faculty_ID

supported by the fact that we know departments at a university are unique, and belong to a single faculty.

We could be tempted to define a Department table like:

«table» Dept
Dept_ID «PK» Dept_Name Faculty_ID Faculty_Name

Like the Faculty table we have to choose a primary key, and like the Faculty table we chose the ID attribute, in this case Dept_ID.

The problem here lies in the potential anomalies that can arise, as in this example:

**Dept** Table
Dept_ID	Dept_Name	Faculty_ID	Faculty_Name
1	Computing	1	Science
5	English	2	Business

Note the anomaly in the second row, where the Faculty_ID and the Faculty_Name do not match, according to the data given in the previous Faculty table examples. We want to relate a department to a faculty, but it is sufficient to do so with a single attribute that should be a foreign key into the Faculty table. Since the primary key of the Faculty table is the Faculty_ID, it is sufficient to use only the Faculty_ID as a reference to the Faculty table. Using the Faculty_Name field as well is only setting the table up for anomalies.

Thus the better design for the Dept table is:

«table» Dept
Dept_ID «PK» Dept_Name Faculty_ID

and the sample data becomes:

**Dept** Table
Dept_ID	Dept_Name	Faculty_ID
1	Computing	1
5	English	2

There are no possibilities for anomalies in this design, no normal form is violated, and we have reduced redundancy by not repeating the Faculty name.

We also know that:
Instructor_ID → Instructor_Name
and have figured out that:
Instructor_ID → Dept_ID
the implication being, as already noted, that each instructor has a unique ID and belongs to a unique Department. Experience tells us that an instructor name may not be unique, and is thus a poor choice for a primary key.

We could be tempted to define an Instructor table like:

«table» Instructor
Instructor_ID «PK» Instructor_Name Dept_ID Dept_Name

but like the example poor choice for a Department table above, this can lead to anomalies, such as:

**Instructor** Table
Instructor_ID	Instructor_Name	Dept_ID	Dept_Name
14	David Brown	1	Computing
23	Cranston Snord	5	History

We know from the proper Dept table examples that Dept_ID 5 is English, not History, thus this demonstrates how we can have anomalous data in such a table.

The better design for this table is:

«table» Instructor
Instructor_ID «PK» Instructor_Name Dept_ID

and our sample data is:

**Instructor** Table
Instructor_ID	Instructor_Name	Dept_ID
14	David Brown	1
23	Cranston Snord	5

There are no possibilities for anomalies in this design, no normal form is violated, and we have reduced redundancy by not repeating the Department name.

We know that:
Course_ID → Course_Name
Course_Name → Instructor_ID
Instructor_Name ↠ Course_ID

which tells us that each course has a single instructor, but each instructor may teach many courses.

Keep in mind that a given dependency does not tie us to a specific use of the given attributes in a table design, for example, the dependencies:

Instructor_ID → Instructor_Name
Instructor_Name ↠ Course_ID

tells us that the following is also true:
Instructor_ID ↠ Course_ID
meaning that we are not required to use the instructor name when relating a course to an instructor - using the instructor ID is perfectly acceptable, and given our preference for using ID fields as foreign key fields is a better choice.

The dependency:
Course_ID ↠ Student_ID
adds another layer, so we could be tempted to define a Course table like:

«table» Course
Course_ID «PK» Course_Name Instructor_ID Instructor_Name Student_ID

This may lead to multiple anomalies:

**Course** Table
Course_ID	Course_Name	Instructor_ID	Instructor_Name	Student_ID
8	CP363	14	Bob Smith	934859483
8	HI242	14	David Brown	217384732
22	BW101	94	Nakhat Fatima	217384732

The course information must be repeated because if the course is matched against a student, the course information must be repeated for every student ID. Because of this, we have potential anomalies because of the repetition of the course name and the repetition of the instructor name. (Which Normal Forms are being violated here?)

Clearly a better design - the key, the whole key, and nothing but the key - is to isolate the course information and not repeat any related data:

«table» Course
Course_ID «PK» Course_Name Instructor_ID

with sample data:

**Course** Table
Course_ID	Course_Name	Instructor_ID
8	CP363	14
22	BW101	94

With this design we have reduced redundancy and eliminated the possibility of anomalies.

Note that there are even worse designs - a Course table that includes student names as well as IDs would introduce even more possible anomalies.

The dependencies:
Student_ID ↠ Course_ID
Course_ID ↠ Student_ID
do not imply the existence of multiple tables, merely that the relationship between a student and a course is commutative, i.e. one student may take many courses, and one course may have many students. Thus:

«table» Student_Course
Course_ID «PK» Student_ID «PK»

with sample data:

**Student_Course** Table
Course_ID	Student_ID
8	934859483
8	217384732
22	217384732

This is a look-up table that relates a course to a student, and visa-versa. By having both of its fields as a primary key, there can be no possibility of a student taking a course multiple times, or of a course having a student registered in it multiple times.

Having any other fields (e.g. course name, student name, instructor information) in this table would clearly lead to unnecessary redundancy, and many possible anomalies, as in the previous examples.

This leaves us with the dependency:
Student_ID → {Surname, Forename, Address}
which implies the table:

«table» Student
Student_ID «PK» Surname Forename Address

No other attributes should be attached to this table because they would clearly lead to unnecessary redundancy, and many possible anomalies, as in the previous examples.

Whether you design these tables in the order shown here (Faculty to Student) or some other order (Student to Faculty), the results should be the same. Recognizing and removing the possibilities of anomalies is key to a good database design. Whether you do that by applying Normal Form rules, or just realizing the potential redundancies and errors doesn't matter, the results should be the same.