Introduction

There are two basic design methodologies for designing a database:

Bottom-Up Design Methodology: Starts with all possible individual attributes for a database and builds relationships from that. Rarely used.
Top-Down Design Methodology: Starts with tables and relations that have already been defined conceptually, and further refines the database design. The concepts of functional dependency and normalization are used with this methodology as criteria for proper database design.

What is a "good" relational database design?

Independent data in separate tables, i.e.: Each table should consist of a primary key and a set of mutually independent attributes

Functional dependency and normalization are tools for good relational database design.

Functional Dependency

A set of attributes \(Y\) is functionally dependent on a set of attributes \(X(X→Y)\) when \(Y\) can be uniquely determined by \(X\). This can be read as "X functionally determines Y", or as "Y is functionally determined by X". Note that the converse, \(Y→X\), is not necessarily true.

This is not to say that because \(X\) is unique \(Y\) is also unique. For example, although {studentId,course}→grade, (i.e. a student in a given course may have only one grade, for example an 'A'), other students may also have an 'A' in the same course. It simply means that a particular student in a particular course has one and only one grade. As well, clearly grade→{studentId,course} is not true.

A functional dependency is a type of constraint on attributes that arises out of the meaning of those attributes. In other words, in any given tuple the value of one set of attributes depends on the value of another set of attributes. These depend on the semantics, or rules, underlying the relation in question. The previous example, {studentId,course}→grade, could turn out to be false if a student is allowed to take the same course in a different term. Then you could infer that the functional dependency should be {studentId,course,Term}→grade instead. It is not always easy to infer a functional dependency, although it takes only a single violation of an FD to falsify it, i.e. show that it is false.

Multi-valued Dependency

A multi-valued dependency implies that a value or set of values may be associated with many values, and is represented with a ↠ symbol, as in \(X↠Y\) ("X multi-determines Y"). \(Y\) cannot be necessarily be uniquely determined by \(X\).

For example, studentId↠course in that one student may take many courses. memberId↠pubId in that one member may have many publications.

In terms of database table design, a multi-valued dependency implies redundancy, and therefore must be decomposed into a series of functional dependencies, following normalization rules.

Functional Dependency Rules

Reflexive Rule: If \(X\) is a superset of \(Y\), then \(X→Y\)
Augmentation Rule: If \(X→Y\), then \(XZ→YZ\)
Transitive Rule: If \(X→Y\) and \(Y→Z\) then \(X→Z\)
Decomposition Rule: If \(X→YZ\) then \(X→Y\) and \(X→Z\)
Union Rule: If \(X→Y\) and \(X→Z\) then \(X→Y\)Z
Pseudotransitive Rule: If \(X→Y\) and \(WY→Z\) then \(WX→Z\)

A minimal set of dependencies is a set of dependencies that has:

a standard form where every FD has a single attribute on the right-hand side
no redundancies in that there are no redundant attributes on the left side of any FD in a set
no redundancies in that no FD in a set is superfluous, i.e. can be created from any of the FD rules stated above

There may not be a unique set of FDs that can be used to describe a set of relations in a database. However, a minimal cover set of FDs is a set of the smallest number FDs that meets the above requirements.

Why are functional dependencies useful? An FD represents an integrity constraint - i.e. semantics can be rendered into database constraints. Further they can aid in the normalization of a set of relations. Properly defining a minimal set of FDs can help remove anomalies from table and relationship designs. For example, does the sample table on this page represent a minimal set of functional dependencies for the data used?

Dependency Extraction

**student** Table
studentId	course	dept	surname	forename	instructor	grade
999568440	CP363	Computing	Snord	Cranston	D. Brown	F
999568440	CP400	Computing	Snord	Cranston	T. Yang	A-
999568440	CP102	Computing	Snord	Cranston	D. Brown	C
987859400	PC466	Physics	Zzap	Zachary	B. Pavlova	D
987859400	HP202	History	Zzap	Zachary	S. Zeller	D
987859400	CP102	Computing	Zzap	Zachary	D. Brown	B+
005689250	CP102	Computing	Snord	Lillibelle	D. Brown	A+

Based upon the data we can see, we can claim the following:

{surname,course}→grade: is false, as there are multiple occurrences of a combination of the same last name and course name.
studentId→{surname,forename}: is true, and there is no reason to doubt that the semantics behind this matches our observation. It follows from this that both studentId→surname and studentId→forename are true by the decomposition rule.
instructor→course: is false, as clearly one instructor is involved in many courses.
course→instructor: may be true, as each value of course determines only one value of instructor. However, if the semantics of the situation claimed that a course may have multiple instructors then this FD would be false, despite the fact that the data we can see supports it. Without knowing the semantics we cannot be sure of an FD.
instructor→dept: may be true, if the semantics are that an instructor may belong to only one department.
course→dept: is true, as a course is offered by only one department. It could also be claimed that if both course→instructor and instructor→dept are true then course→dept must be true by the transitive properties of functional dependencies.
studentId↠{course,dept,instructor,grade}: a student may take many courses in many departments with many instructors and get many grades.
dept↠instructor: one department may have many instructors. It is possible that an instructor may belong to many departments, but there is no evidence of that from the data presented.

Table Design Example

The following is a list of functional and multi-valued dependencies for a company:

Employee_ID→Supervisor_ID
Supervisor_ID↠Employee_ID
Employee_ID↠Child
Employee_ID↠{Salary, Year}
Employee_ID→Spouse
{Employee_ID, Year}→Salary
Year↠Salary
Employee_ID→{Name, Address}

How can these dependencies help us design a schema?

First, extract the functional dependencies based upon a single attribute and see what they tell us:

Employee_ID→Supervisor_ID
Employee_ID→Spouse
Employee_ID→{Name, Address}

These three dependencies imply that Employee_ID could be the primary key in a table that also contains Spouse, Name, Address, and Supervisor_ID, since these attributes depend only upon a single Employee_ID. The {Name, Address} group can be split into two separate attributes since neither appear anywhere else in the dependencies:

«table» Employees
Spouse Name Address Supervisor_ID «PK» Employee_ID

Now, extract the multi-valued dependencies based upon a single attribute:

Supervisor_ID↠Employee_ID
Employee_ID↠Child
Employee_ID↠{Salary, Year}
Year↠Salary

The dependency Employee_ID↠Child implies that one employee may have many children. However, there is no dependency from Child to any other attribute. This leaves us with a table that relates only Employee_ID and Child. Since an Employee_ID may repeat, the tables primary key cannot consist only of the Employee_ID. Since there is only one other attribute in the table, i.e. Child, the primary key must consist of {Employee_ID, Child}:

«table» Children
«PK» Employee_ID Child

The dependency Employee_ID↠{Salary, Year} implies that an employee's salary may vary from year to year. The dependency Year↠Salary implies that there may be many different salaries in any given year. Together these dependencies imply that any given employee may have only one salary in any given year. This then implies the existance of a table that contains these three attributes, and that the table's primary key must consist of {Employee_ID, Year}:

«table» Salaries
Salary «PK» Employee_ID Year

The only remaining relationship is Supervisor_ID↠Employee_ID, implying that one supervisor oversees many employees. We are informed that a supervisor is also an employee and has an employee ID, and is therefore also represented in the Employees table. This information is already stored in the Employees table through the functional dependency Employee_ID→Supervisor_ID.

These tables should then have their various foreign keys and relationships defined for their final schema design:

Schema Design

Here is some simple sample data that shows these dependencies:

employees
Employee_ID	Name	Address	Spouse	Supervisor_ID
1	Escobar, Dr. Hortenza	Waterloo, ON	NULL	NULL
97	Brown, David	Waterloo, ON	Lori	1
64	Gartner, Manfred	Kitchener, ON	Elaine	1

Note that this table allows a NULL value for the Supervisor_ID since no employee can be their own supervisor, and employee 1 is the highest ranking employee and thus has no supervisor.

Children
Employee_ID	Child
97	Tasmin
97	Tristan
97	Deborah

Salaries
Employee_ID	Year	Salary
1	2010	9,000.00
97	1986	400.00
97	1987	400.00
97	2010	2,300,000.00