2020-12-16 07:12

CP363 : Boyce-Codd Normal Form

Raymond F. Boyce and Edgar F. Codd developed this form in 1974. Codd developed the first three normal forms in the 1960s and published his seminal paper in 1970.

BCNF is a more restrictive version of 3NF.

The table must be in 3NF.
Every non-trivial functional dependency must be a dependency on a superkey. (A trivial functional dependency would be A→A.)

Every relation in BCNF is also in 3NF, but the reverse is not necessarily true. 3NF allows attributes to be part of a candidate key that is not the primary key; BCNF does not.

This means that relations in 3NF are often in BCNF, but not always. The difference is subtle, but important, as in the next example.

Most relations that are in third normal form are also in Boyce-Codd normal form (BCNF), but not all. It might be easier to say what is not in BCNF: A relation is not in BCNF if

the candidate keys in the relation are composite keys (that is, they are not single attribute keys)
there is more than one candidate key in the relation
the keys overlap, that is, some attributes in the keys are common.

BCNF Violation

The following is in 3NF but not in BCNF:

ID	TutorID	TutorSIN
8245	1254	000 737 999
9995	1567	000 656 999
5643	1254	000 737 999
6179	1742	000 651 999

Note that in the semantics for this data each student may have only one tutor, but each tutor may have many students.

This table is subject to insertion anomalies as both the Tutor ID and SIN must be entered whenever a tutor-student pair is entered. Here again, common sense tells us that this table is not well designed, even if we don't immediately recognize that it violates BCNF.

Candidate keys are: {ID, TutorID} and {ID, TutorSIN}
TutorID → TutorSIN and TutorSIN → TutorID, but because both TutorID and TutorSIN are prime attributes these FDs do not violate 3NF.
Neither TutorID nor TutorSIN alone are superkeys, and thus BCNF is violated.

Both TutorID and TutorSIN are prime attributes because they are part of candidate keys. The use of both attributes in this table violates BCNF. The fix is to use only one of these two attributes. As noted before, BCNF is a stronger version of 3NF because it restricts both prime and non-prime attributes, while 3NF only restricts non-prime attributes.

BCNF Violation - Fixed

The original relation is decomposed into: (although the functional dependencies show that another decomposition is also valid)

ID	TutorID
8245	1254
9995	1567
5643	1254
6179	1742

TutorID	TutorSIN
1254	000 737 999
1567	000 656 999
1742	000 651 999

Why this decomposition over the other, where student IDs are matched against tutor SINs? Here outside considerations help in our decision. SINs are treated specially and should be provided extra protection and privacy. The best way to do this is to minimize their use. Given a choice, then, between the two decompositions, the one that minimizes the use of SINs is the best.

Again, update and insertion anomalies have been removed.

Normalization Algorithm

Suppose that R is not in BCNF. Let X be a subset of R, A a single attribute in R, and X→A be a non-trivial FD that violates BCNF. Decompose R into R-A and XA.
If either R-A or XA are not in BCNF, repeat step 1 with the violating relation(s).

This simple recursive algorithm produces BCNF compliant relations every time. Of course, databases development doesn't normally start out with one huge relation of all the attributes in the database. The semantics of the data point the way to how relations should be organized. When semantics are unclear or a developer is simply unsure of where to go next, this algorithm can help the process.

Relation Decomposition

How do you determine which decomposition to use?

“Each attribute must describe the key, the whole key, and nothing but the key.” – Kent (1983) William Kent, A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM 26(2), Feb. 1983.
Often common sense or external rules can provide a guide to decomposition.

Kent is saying that a relation should be centred around one 'fact', and that all attributes in a relation should be connected to that fact. The key provides the foundation for this fact. That a student takes a course is a fact (and a key). The grade a student gets in that course is part of that 'fact'.

I do not wish to imply that you must process the normal forms in order from 1NF to BCNF. The algorithm given earlier is concerned only with BCNF. Focus on that, and the other forms follow.