Raymond F. Boyce and Edgar F. Codd developed this form in 1974. Codd developed the first three normal forms in the 1960s and published his seminal paper in 1970.
BCNF is a more restrictive version of 3NF.
The table must be in 3NF.
Every non-trivial functional dependency must be a dependency on a superkey. (A trivial functional dependency would be A→A.)
Every relation in BCNF is also in 3NF, but the reverse is not necessarily true. 3NF allows attributes to be part of a candidate key that is not the primary key; BCNF does not.
This means that relations in 3NF are often in BCNF, but not always. The difference is subtle, but important, as in the next example.
Most relations that are in third normal form are also in Boyce-Codd normal form (BCNF), but not all. It might be easier to say what is not in BCNF: A relation is not in BCNF if
The following is in 3NF but not in BCNF:
ID | TutorID | TutorSIN |
---|---|---|
8245 | 1254 | 000 737 999 |
9995 | 1567 | 000 656 999 |
5643 | 1254 | 000 737 999 |
6179 | 1742 | 000 651 999 |
Note that in the semantics for this data each student may have only one tutor, but each tutor may have many students.
This table is subject to insertion anomalies as both the Tutor ID and SIN must be entered whenever a tutor-student pair is entered. Here again, common sense tells us that this table is not well designed, even if we don't immediately recognize that it violates BCNF.
Candidate keys are: {ID, TutorID}
and {ID, TutorSIN}
TutorID → TutorSIN
and TutorSIN → TutorID
, but
because both TutorID
and TutorSIN
are prime
attributes these FDs do not violate 3NF.
Neither TutorID
nor TutorSIN
alone are
superkeys, and thus BCNF is violated.
Both TutorID
and TutorSIN
are prime attributes
because they are part of candidate keys. The use of both attributes in this
table violates BCNF. The fix is to use only one of these two attributes. As
noted before, BCNF is a stronger version of 3NF because it restricts both
prime and non-prime attributes, while 3NF only restricts non-prime
attributes.
The original relation is decomposed into: (although the functional dependencies show that another decomposition is also valid)
ID | TutorID |
---|---|
8245 | 1254 |
9995 | 1567 |
5643 | 1254 |
6179 | 1742 |
TutorID | TutorSIN |
---|---|
1254 | 000 737 999 |
1567 | 000 656 999 |
1742 | 000 651 999 |
Why this decomposition over the other, where student IDs are matched against tutor SINs? Here outside considerations help in our decision. SINs are treated specially and should be provided extra protection and privacy. The best way to do this is to minimize their use. Given a choice, then, between the two decompositions, the one that minimizes the use of SINs is the best.
Again, update and insertion anomalies have been removed.
Suppose that R is not in BCNF. Let X be a subset of R, A a single attribute in R, and X→A be a non-trivial FD that violates BCNF. Decompose R into R-A and XA.
If either R-A or XA are not in BCNF, repeat step 1 with the violating relation(s).
This simple recursive algorithm produces BCNF compliant relations every time. Of course, databases development doesn't normally start out with one huge relation of all the attributes in the database. The semantics of the data point the way to how relations should be organized. When semantics are unclear or a developer is simply unsure of where to go next, this algorithm can help the process.
How do you determine which decomposition to use?
“Each attribute must describe the key, the whole key, and nothing but the key.” – Kent (1983) William Kent, A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM 26(2), Feb. 1983.
Often common sense or external rules can provide a guide to decomposition.
Kent is saying that a relation should be centred around one 'fact', and that all attributes in a relation should be connected to that fact. The key provides the foundation for this fact. That a student takes a course is a fact (and a key). The grade a student gets in that course is part of that 'fact'.
I do not wish to imply that you must process the normal forms in order from 1NF to BCNF. The algorithm given earlier is concerned only with BCNF. Focus on that, and the other forms follow.