CP363 : First Normal Form

Fixed by decomposition into a series of single-valued relations. Removes redundancy, and is general, placing no limit on number of values.

It is the nature of a relational database management system to allow only atomic values in an attribute - if an attribute is defined as integer, then only a single integer could be stored in such a field. Thus it is difficult, though possible, to violate 1NF with such a database system. Typically it can be accomplished by misusing strings.

Example not in 1NF.

Student_ID Course
999568440 { CP363, CP400, CP102 }
987859400 {CP466, HP202, CP102 }

This data clearly demonstrates its semantics - one student may take many courses and each course may have many students.

You would have to deliberately set aside enough space to store multiple course names. Setting aside only 5 characters would force you to store only one course name at a time.

Imagine that you wish to determine the total enrolment in all computing courses, i.e. courses that start with 'CP':

select count(ID) from ... where Courses like '%CP%'

This returns a value of 2 rather than a correct value of 4 because each tuple is counted only once. A proper counting would require a much more complex select statement.

1NF Violation – Not Fixed

ID Course1 Course2 Course3
8245 CP102 HP202 null
9995 CP363 CP400 CP102

The count from the previous example would now be:

select count(ID) from ... where Course1 like 'CP%' or Course2 like 'CP%'
or Course3 like 'CP%' 

and this would still not give you the right answer. Updating would mean finding the first free attribute of the three - and what happens when a student takes four courses?

1NF Violation - Fixed:

Student_ID Course
999568440 CP363
999568440 CP400
999568440 CP102
987859400 CP466
987859400 HP202
987859400 CP102

This seems to have introduced some redundancy, but it now means that our enrolment count statement works properly. We'll see how to reduce redundancy with other normal forms.

Note that the semantics of the data are preserved: one student may still take many courses and one course may have many students. If the data semantics are violated by a normalization procedure then we have done something wrong.