Introduction

Proper table design is a fundamental part of any database implementation. Improperly designed tables and joins are a source of anomalies - incorrect or misleading data. The following are examples of some of these anomalies.

Insertion Anomaly

Given the following table, we can have problems when inserting new tuples. (Assume that the primary key is {studentId, course}).

**student** Table
studentId	course	dept	surname	forename	instructor	grade
999568440	CP363	Computing	Snord	Cranston	D. Brown	F
999568440	CP400	Computing	Snord	Cranston	T. Yang	A-
999568440	CP102	Computing	Snord	Cranston	D. Brown	C
987859400	PC466	Physics	Zzap	Zachary	B. Pavlova	D
987859400	HP202	History	Zzap	Zachary	S. Zeller	D
987859400	CP102	Computing	Zzap	Zachary	D. Brown	B+
005689250	CP102	Computing	Snord	Lillibelle	D. Brown	A+
tuple with incorrect data
999568440	PC100	NULL	Snord	Bob	H. Bezner	F
invalid tuples
985768920	NULL	NULL	Aardvark	Anthony	NULL	NULL
NULL	EN228	NULL	NULL	NULL	G. Boire	NULL

The tuple noted as having incorrect data (Bob Snord) shows the problem of keeping consistency: Mr. Snord's first name is wrong in the new tuple. Because we are using redundant data we increase the possibility of error creeping in. We are also not entering the department information, and all courses should be related to a department.

The two tuples noted as invalid (Anthony Aardvark and NULL NULL) show the difficulties that arise when we do not yet have all of the data to hand, but the table design requires it. In the first case we add a new student who is not yet taking a course, in the second a course that does not yet have any students. The empty fields must be left as NULL. The problem that arises is that both studentId and course are part of the primary key, and primary keys do not allow NULLs. Thus the last two tuples could not actually be inserted into the table.

The implication of this anomaly is that the table is poorly designed. Student and course data are associated but independent data. Students exist without courses, courses exist without students. Assigning them to the same table is poor design.

Deletion Anomaly

Given the following table, we can have problems when deleting new tuples. (Assume that the primary key is {studentId, course}).

**student** Table
studentId	course	dept	surname	forename	instructor	grade
999568440	CP363	Computing	Snord	Cranston	D. Brown	F
999568440	CP400	Computing	Snord	Cranston	T. Yang	A-
999568440	CP102	Computing	Snord	Cranston	D. Brown	C
987859400	PC466	Physics	Zzap	Zachary	B. Pavlova	D
987859400	HP202	History	Zzap	Zachary	S. Zeller	D
987859400	CP102	Computing	Zzap	Zachary	D. Brown	B+
005689250	CP102	Computing	Snord	Lillibelle	D. Brown	A+

If we delete the last student from a course, we lose the course information. For example, if we delete Zachary Zzap from HP202, the course disappears.

As noted in the Insertion Anomaly, this problem arises due to having associated but independent data in the same table.

Modification Anomaly

When data is repeated, we must implement proper design to avoid generating inconsistencies. For example: if the course instructor for CP102 changes, we must change this information in more than one tuple. If we miss a tuple then the data is no longer valid.

**student** Table
studentId	course	dept	surname	forename	instructor	grade
999568440	CP363	Computing	Snord	Cranston	D. Brown	F
999568440	CP400	Computing	Snord	Cranston	T. Yang	A-
999568440	CP102	Computing	Snord	Cranston	X. Sharma	C
987859400	PC466	Physics	Zzap	Zachary	B. Pavlova	D
987859400	CP102	Computing	Zzap	Zachary	D. Brown	B+
005689250	CP102	Computing	Snord	Lillibelle	X. Sharma	A+

As noted in the Insertion and Deletion Anomalies, this problem arises due to having associated but independent data in the same table.

Spurious Tuples

If a join loses information (not necessarily tuples - in fact, invalid tuples may be added through a join), it is an invalid join. Valid joins are called non-additive (lossless) joins.

The following is an example of a join that produces spurious tuples:

**studentInstructor** Table
studentId	surname	forename	instructor	grade
999568440	Snord	Cranston	D. Brown	F
999568440	Snord	Cranston	T. Yang	A-
999568440	Snord	Cranston	D. Brown	C
987859400	Zzap	Zachary	B. Pavlova	D
987859400	Zzap	Zachary	S. Zeller	D
987859400	Zzap	Zachary	D. Brown	B+

is joined to the following table by their common field instructor.

**courseInstructor** Table
course	instructor
CP363	D. Brown
CP400	T. Yang
CP102	D. Brown
CP466	B. Pavlova
HP202	S. Zeller

by the SQL statement

Join studentInstructor and courseInstructor

              
SELECT * FROM studentInstructor AS si
  INNER JOIN courseInstructor AS ci
  ON si.instructor = ci.instructor

resulting in:

*Result Set*
studentId	course	surname	forename	instructor	grade
999568440	CP363	Snord	Cranston	D. Brown	F
999568440	CP102	Snord	Cranston	D. Brown	F
999568440	CP400	Snord	Cranston	T. Yang	A-
999568440	CP363	Snord	Cranston	D. Brown	C
999568440	CP102	Snord	Cranston	D. Brown	C
987859400	CP466	Zzap	Zachary	B. Pavlova	D
987859400	HP202	Zzap	Zachary	S. Zeller	D
987859400	CP102	Zzap	Zachary	D. Brown	B+
987859400	CP363	Zzap	Zachary	D. Brown	B+

The light-green tuples are spurious tuples, i.e. they contain invalid information that arises out of joining together tuples that do not belong together.

The spurious tuples arise out of this join because of the poor design of the source tables. Students take courses and should be associated with courses, not with instructors. Since instructors are also associated with courses - and potentially with multiple courses - the join between these tables produce tuples that have no existence in reality. (If each instructor only taught one course then this problem wouldn't arise, but that's not the reality.)

The fix is to redesign the tables to properly reflect the real-world association amongst the data. Students should be associated with courses, not instructors, and a studentCourse table joined with the courseInstructor table would not have these spurious tuples. Of course this fix would work only if each course has only one instructor. Courses that have multiple sections and instructors would require another redesign, this time with tables that relate courses, sections, students, and instructors. Keeping track of course sections offered in different terms would require yet another redesign that tracks the terms as well. Complex data requires complex database design.

Lesson 7: Data Anomalies

Introduction

Insertion Anomaly

Deletion Anomaly

Modification Anomaly

Spurious Tuples