Given the following table, we can have problems when inserting new tuples. (Assume that the primary key is { Student_ID, Course } ).
Student_ID | Course | Last_Name | First_Name | Instructor | Grade |
---|---|---|---|---|---|
999568440 | CP363 | Snord | Cranston | D. Brown | F |
999568440 | CP400 | Snord | Cranston | T. Yang | A- |
999568440 | CP102 | Snord | Cranston | D. Brown | C |
987859400 | PC466 | Zzap | Zachary | B. Pavlova | D |
987859400 | HP202 | Zzap | Zachary | S. Zeller | D |
987859400 | CP102 | Zzap | Zachary | D. Brown | B+ |
new tuples | |||||
999568440 | PC100 | Snord | Bob | H. Bezner | F |
985768920 | NULL | Aardvark | Anthony | NULL | NULL |
NULL | EN228 | NULL | NULL | G. Boire | NULL |
The first new tuple (Bob Snord) shows the problem of keeping consistency. (Mr. Snord's first name is wrong in the new tuple). Because we are using redundant data we increase the possibility of error creeping in.
The next two tuples show the difficulties that arise when we do not yet
have all of the data to hand. In one case we add a new student who is not yet
taking a course, in the other a course that does not yet have any students.
The empty fields must be left as NULL
. One of the problems that
arises is that both Student_ID and Course
are part of the primary key, and primary keys do not allow
NULL
s. Thus the last two tuples could not be inserted into the
table.
If we delete the last student from a course, we lose the course
information. For example, if we delete Zachary Zzap
from
HP202
, the course disappears.
Because data is repeated, we must do more work to avoid generating inconsistencies. For example: if the course instructor for CP102 changes, we must change this information in more than one tuple. If we miss a tuple then the data is no longer valid.
If a join loses information (not necessarily tuples - in fact, tuples may be added through a join), it is an invalid join. Valid joins are called nonadditive (lossless) joins.
The following is an example of a join that produces spurious tuples:
Student_ID | Last_Name | First_Name | Instructor | Grade |
---|---|---|---|---|
999568440 | Snord | Cranston | D. Brown | F |
999568440 | Snord | Cranston | T. Yang | A- |
999568440 | Snord | Cranston | D. Brown | C |
987859400 | Zzap | Zachary | B. Pavlova | D |
987859400 | Zzap | Zachary | S. Zeller | D |
987859400 | Zzap | Zachary | D. Brown | B+ |
is joined to the following table by their common field Instructor.
Course | Instructor |
---|---|
CP363 | D. Brown |
CP400 | T. Yang |
CP102 | D. Brown |
CP466 | B. Pavlova |
HP202 | S. Zeller |
by the SQL statement
SELECT * FROM Student_Instructor AS si INNER JOIN Course_Instructor AS ci ON si.Instructor = ci.Instructor
resulting in:
Student_ID | Course | Last_Name | First_Name | Instructor | Grade | |
---|---|---|---|---|---|---|
999568440 | CP363 | Snord | Cranston | D. Brown | F | |
999568440 | CP102 | Snord | Cranston | D. Brown | F | * |
999568440 | CP400 | Snord | Cranston | T. Yang | A- | |
999568440 | CP363 | Snord | Cranston | D. Brown | C | * |
999568440 | CP102 | Snord | Cranston | D. Brown | C | |
987859400 | CP466 | Zzap | Zachary | B. Pavlova | D | |
987859400 | HP202 | Zzap | Zachary | S. Zeller | D | |
987859400 | CP102 | Zzap | Zachary | D. Brown | B+ | |
987859400 | CP363 | Zzap | Zachary | D. Brown | B+ | * |
The tuples marked with '*' are spurious tuples, i.e. they contain invalid information that arises out of their joining together tuples that do not belong together.
One fix is to join tables on attributes that are primary keys or foreign keys. In this case the organization of the first table causes the error. The course a student takes should be identified by the Course attribute, not the Instructor attribute. (The example given would work if instructors never taught more than one course).