2020-12-16 07:12

CP363 : Join Anomalies

Insertion Anomaly

Given the following table, we can have problems when inserting new tuples. (Assume that the primary key is { Student_ID, Course } ).

Student_ID	Course	Last_Name	First_Name	Instructor	Grade
999568440	CP363	Snord	Cranston	D. Brown	F
999568440	CP400	Snord	Cranston	T. Yang	A-
999568440	CP102	Snord	Cranston	D. Brown	C
987859400	PC466	Zzap	Zachary	B. Pavlova	D
987859400	HP202	Zzap	Zachary	S. Zeller	D
987859400	CP102	Zzap	Zachary	D. Brown	B+
new tuples
999568440	PC100	Snord	Bob	H. Bezner	F
985768920	NULL	Aardvark	Anthony	NULL	NULL
NULL	EN228	NULL	NULL	G. Boire	NULL

The first new tuple (Bob Snord) shows the problem of keeping consistency. (Mr. Snord's first name is wrong in the new tuple). Because we are using redundant data we increase the possibility of error creeping in.

The next two tuples show the difficulties that arise when we do not yet have all of the data to hand. In one case we add a new student who is not yet taking a course, in the other a course that does not yet have any students. The empty fields must be left as NULL. One of the problems that arises is that both Student_ID and Course are part of the primary key, and primary keys do not allow NULLs. Thus the last two tuples could not be inserted into the table.

Deletion Anomaly

If we delete the last student from a course, we lose the course information. For example, if we delete Zachary Zzap from HP202, the course disappears.

Modification Anomaly

Because data is repeated, we must do more work to avoid generating inconsistencies. For example: if the course instructor for CP102 changes, we must change this information in more than one tuple. If we miss a tuple then the data is no longer valid.

Spurious Tuples

If a join loses information (not necessarily tuples - in fact, tuples may be added through a join), it is an invalid join. Valid joins are called nonadditive (lossless) joins.

The following is an example of a join that produces spurious tuples:

Student_ID	Last_Name	First_Name	Instructor	Grade
999568440	Snord	Cranston	D. Brown	F
999568440	Snord	Cranston	T. Yang	A-
999568440	Snord	Cranston	D. Brown	C
987859400	Zzap	Zachary	B. Pavlova	D
987859400	Zzap	Zachary	S. Zeller	D
987859400	Zzap	Zachary	D. Brown	B+

is joined to the following table by their common field Instructor.

Course	Instructor
CP363	D. Brown
CP400	T. Yang
CP102	D. Brown
CP466	B. Pavlova
HP202	S. Zeller

by the SQL statement

SELECT * FROM Student_Instructor AS si
  INNER JOIN Course_Instructor AS ci
  ON si.Instructor = ci.Instructor

resulting in:

Student_ID	Course	Last_Name	First_Name	Instructor	Grade
999568440	CP363	Snord	Cranston	D. Brown	F
999568440	CP102	Snord	Cranston	D. Brown	F	*
999568440	CP400	Snord	Cranston	T. Yang	A-
999568440	CP363	Snord	Cranston	D. Brown	C	*
999568440	CP102	Snord	Cranston	D. Brown	C
987859400	CP466	Zzap	Zachary	B. Pavlova	D
987859400	HP202	Zzap	Zachary	S. Zeller	D
987859400	CP102	Zzap	Zachary	D. Brown	B+
987859400	CP363	Zzap	Zachary	D. Brown	B+	*

The tuples marked with '*' are spurious tuples, i.e. they contain invalid information that arises out of their joining together tuples that do not belong together.

One fix is to join tables on attributes that are primary keys or foreign keys. In this case the organization of the first table causes the error. The course a student takes should be identified by the Course attribute, not the Instructor attribute. (The example given would work if instructors never taught more than one course).