CP363 : Boyce-Codd Normal Form

Raymond F. Boyce and Edgar F. Codd developed this form in 1974. Codd developed the first three normal forms in the 1960s and published his seminal paper in 1970.

BCNF is a more restrictive version of 3NF.

Every relation in BCNF is also in 3NF, but the reverse is not necessarily true. 3NF allows attributes to be part of a candidate key that is not the primary key; BCNF does not.

This means that relations in 3NF are often in BCNF, but not always. The difference is subtle, but important, as in the next example.

Most relations that are in third normal form are also in Boyce-Codd normal form (BCNF), but not all. It might be easier to say what is not in BCNF: A relation is not in BCNF if

  1. the candidate keys in the relation are composite keys (that is, they are not single attribute keys)
  2. there is more than one candidate key in the relation
  3. the keys overlap, that is, some attributes in the keys are common.

BCNF Violation

The following is in 3NF but not in BCNF:

ID TutorID TutorSIN
8245 1254 000 737 999
9995 1567 000 656 999
5643 1254 000 737 999
6179 1742 000 651 999

Note that in the semantics for this data each student may have only one tutor, but each tutor may have many students.

This table is subject to insertion anomalies as both the Tutor ID and SIN must be entered whenever a tutor-student pair is entered. Here again, common sense tells us that this table is not well designed, even if we don't immediately recognize that it violates BCNF.

Both TutorID and TutorSIN are prime attributes because they are part of candidate keys. The use of both attributes in this table violates BCNF. The fix is to use only one of these two attributes. As noted before, BCNF is a stronger version of 3NF because it restricts both prime and non-prime attributes, while 3NF only restricts non-prime attributes.


BCNF Violation - Fixed

The original relation is decomposed into: (although the functional dependencies show that another decomposition is also valid)

ID TutorID
8245 1254
9995 1567
5643 1254
6179 1742

TutorID TutorSIN
1254 000 737 999
1567 000 656 999
1742 000 651 999

Why this decomposition over the other, where student IDs are matched against tutor SINs? Here outside considerations help in our decision. SINs are treated specially and should be provided extra protection and privacy. The best way to do this is to minimize their use. Given a choice, then, between the two decompositions, the one that minimizes the use of SINs is the best.

Again, update and insertion anomalies have been removed.


Normalization Algorithm

  1. Suppose that R is not in BCNF. Let X be a subset of R, A a single attribute in R, and X→A be a non-trivial FD that violates BCNF. Decompose R into R-A and XA.

  2. If either R-A or XA are not in BCNF, repeat step 1 with the violating relation(s).

This simple recursive algorithm produces BCNF compliant relations every time. Of course, databases development doesn't normally start out with one huge relation of all the attributes in the database. The semantics of the data point the way to how relations should be organized. When semantics are unclear or a developer is simply unsure of where to go next, this algorithm can help the process.


Relation Decomposition

How do you determine which decomposition to use?

Kent is saying that a relation should be centred around one 'fact', and that all attributes in a relation should be connected to that fact. The key provides the foundation for this fact. That a student takes a course is a fact (and a key). The grade a student gets in that course is part of that 'fact'.

I do not wish to imply that you must process the normal forms in order from 1NF to BCNF. The algorithm given earlier is concerned only with BCNF. Focus on that, and the other forms follow.