CP363 : Join Anomalies

Insertion Anomaly

Given the following table, we can have problems when inserting new tuples. (Assume that the primary key is { Student_ID, Course } ).

Student_ID Course Last_Name First_Name Instructor Grade
999568440 CP363 Snord Cranston D. Brown F
999568440 CP400 Snord Cranston T. Yang A-
999568440 CP102 Snord Cranston D. Brown C
987859400 PC466 Zzap Zachary B. Pavlova D
987859400 HP202 Zzap Zachary S. Zeller D
987859400 CP102 Zzap Zachary D. Brown B+
new tuples
999568440 PC100 Snord Bob H. Bezner F
985768920 NULL Aardvark Anthony NULL NULL
NULL EN228 NULL NULL G. Boire NULL

The first new tuple (Bob Snord) shows the problem of keeping consistency. (Mr. Snord's first name is wrong in the new tuple). Because we are using redundant data we increase the possibility of error creeping in.

The next two tuples show the difficulties that arise when we do not yet have all of the data to hand. In one case we add a new student who is not yet taking a course, in the other a course that does not yet have any students. The empty fields must be left as NULL. One of the problems that arises is that both Student_ID and Course are part of the primary key, and primary keys do not allow NULLs. Thus the last two tuples could not be inserted into the table.


Deletion Anomaly

If we delete the last student from a course, we lose the course information. For example, if we delete Zachary Zzap from HP202, the course disappears.


Modification Anomaly

Because data is repeated, we must do more work to avoid generating inconsistencies. For example: if the course instructor for CP102 changes, we must change this information in more than one tuple. If we miss a tuple then the data is no longer valid.


Spurious Tuples

If a join loses information (not necessarily tuples - in fact, tuples may be added through a join), it is an invalid join. Valid joins are called nonadditive (lossless) joins.

The following is an example of a join that produces spurious tuples:

Student_ID Last_Name First_Name Instructor Grade
999568440 Snord Cranston D. Brown F
999568440 Snord Cranston T. Yang A-
999568440 Snord Cranston D. Brown C
987859400 Zzap Zachary B. Pavlova D
987859400 Zzap Zachary S. Zeller D
987859400 Zzap Zachary D. Brown B+

is joined to the following table by their common field Instructor.

Course Instructor
CP363 D. Brown
CP400 T. Yang
CP102 D. Brown
CP466 B. Pavlova
HP202 S. Zeller

by the SQL statement

SELECT * FROM Student_Instructor AS si
  INNER JOIN Course_Instructor AS ci
  ON si.Instructor = ci.Instructor

resulting in:

Student_ID Course Last_Name First_Name Instructor Grade  
999568440 CP363 Snord Cranston D. Brown F  
999568440 CP102 Snord Cranston D. Brown F *
999568440 CP400 Snord Cranston T. Yang A-  
999568440 CP363 Snord Cranston D. Brown C *
999568440 CP102 Snord Cranston D. Brown C  
987859400 CP466 Zzap Zachary B. Pavlova D  
987859400 HP202 Zzap Zachary S. Zeller D  
987859400 CP102 Zzap Zachary D. Brown B+  
987859400 CP363 Zzap Zachary D. Brown B+ *

The tuples marked with '*' are spurious tuples, i.e. they contain invalid information that arises out of their joining together tuples that do not belong together.

One fix is to join tables on attributes that are primary keys or foreign keys. In this case the organization of the first table causes the error. The course a student takes should be identified by the Course attribute, not the Instructor attribute. (The example given would work if instructors never taught more than one course).