2020-12-16 07:12

CP363 : First Normal Form

The domain of an attribute must include only atomic values.
The value of any attribute in a relation must be a single value from that attribute’s domain.

Fixed by decomposition into a series of single-valued relations. Removes redundancy, and is general, placing no limit on number of values.

It is the nature of a relational database management system to allow only atomic values in an attribute - if an attribute is defined as integer, then only a single integer could be stored in such a field. Thus it is difficult, though possible, to violate 1NF with such a database system. Typically it can be accomplished by misusing strings.

Example not in 1NF.

Student_ID	Course
999568440	{ CP363, CP400, CP102 }
987859400	{CP466, HP202, CP102 }

This data clearly demonstrates its semantics - one student may take many courses and each course may have many students.

You would have to deliberately set aside enough space to store multiple course names. Setting aside only 5 characters would force you to store only one course name at a time.

Imagine that you wish to determine the total enrolment in all computing courses, i.e. courses that start with 'CP':

select count(ID) from ... where Courses like '%CP%'

This returns a value of 2 rather than a correct value of 4 because each tuple is counted only once. A proper counting would require a much more complex select statement.

1NF Violation – Not Fixed

Although each attribute contains an atomic value, each attribute represents the same ‘fact’.
This is also very difficult to query or update.

ID	Course1	Course2	Course3
8245	CP102	HP202	null
9995	CP363	CP400	CP102

The count from the previous example would now be:

select count(ID) from ... where Course1 like 'CP%' or Course2 like 'CP%'
or Course3 like 'CP%'

and this would still not give you the right answer. Updating would mean finding the first free attribute of the three - and what happens when a student takes four courses?

1NF Violation - Fixed:

Student_ID	Course
999568440	CP363
999568440	CP400
999568440	CP102
987859400	CP466
987859400	HP202
987859400	CP102

This seems to have introduced some redundancy, but it now means that our enrolment count statement works properly. We'll see how to reduce redundancy with other normal forms.

Note that the semantics of the data are preserved: one student may still take many courses and one course may have many students. If the data semantics are violated by a normalization procedure then we have done something wrong.