Lesson 8: Functional Dependencies

Introduction

There are two basic design methodologies for designing a database:

Bottom-Up Design Methodology
Starts with all possible individual attributes for a database and builds relationships from that. Rarely used.
Top-Down Design Methodology
Starts with tables and relations that have already been defined conceptually, and further refines the database design. The concepts of functional dependency and normalization are used with this methodology as criteria for proper database design.

What is a "good" relational database design?

Independent data in separate tables, i.e.: Each table should consist of a primary key and a set of mutually independent attributes

Functional dependency and normalization are tools for good relational database design.

Functional Dependency

A set of attributes \(Y\) is functionally dependent on a set of attributes \(X(X→Y)\) when \(Y\) can be uniquely determined by \(X\). This can be read as "X functionally determines Y", or as "Y is functionally determined by X". Note that the converse, \(Y→X\), is not necessarily true.

This is not to say that because \(X\) is unique \(Y\) is also unique. For example, although {studentId,course}→grade, (i.e. a student in a given course may have only one grade, for example an 'A'), other students may also have an 'A' in the same course. It simply means that a particular student in a particular course has one and only one grade. As well, clearly grade→{studentId,course} is not true.

A functional dependency is a type of constraint on attributes that arises out of the meaning of those attributes. In other words, in any given tuple the value of one set of attributes depends on the value of another set of attributes. These depend on the semantics, or rules, underlying the relation in question. The previous example, {studentId,course}→grade, could turn out to be false if a student is allowed to take the same course in a different term. Then you could infer that the functional dependency should be {studentId,course,Term}→grade instead. It is not always easy to infer a functional dependency, although it takes only a single violation of an FD to falsify it, i.e. show that it is false.

Multi-valued Dependency

A multi-valued dependency implies that a value or set of values may be associated with many values, and is represented with a symbol, as in \(X↠Y\) ("X multi-determines Y"). \(Y\) cannot be necessarily be uniquely determined by \(X\).

For example, studentId↠course in that one student may take many courses. memberId↠pubId in that one member may have many publications.

In terms of database table design, a multi-valued dependency implies redundancy, and therefore must be decomposed into a series of functional dependencies, following normalization rules.

Functional Dependency Rules

A minimal set of dependencies is a set of dependencies that has:

There may not be a unique set of FDs that can be used to describe a set of relations in a database. However, a minimal cover set of FDs is a set of the smallest number FDs that meets the above requirements.

Why are functional dependencies useful? An FD represents an integrity constraint - i.e. semantics can be rendered into database constraints. Further they can aid in the normalization of a set of relations. Properly defining a minimal set of FDs can help remove anomalies from table and relationship designs. For example, does the sample table on this page represent a minimal set of functional dependencies for the data used?

Dependency Extraction

student Table
studentId course dept surname forename instructor grade
999568440 CP363 Computing Snord Cranston D. Brown F
999568440 CP400 Computing Snord Cranston T. Yang A-
999568440 CP102 Computing Snord Cranston D. Brown C
987859400 PC466 Physics Zzap Zachary B. Pavlova D
987859400 HP202 History Zzap Zachary S. Zeller D
987859400 CP102 Computing Zzap Zachary D. Brown B+
005689250 CP102 Computing Snord Lillibelle D. Brown A+

Based upon the data we can see, we can claim the following:

Table Design Example

The following is a list of functional and multi-valued dependencies for a company:

Employee_ID→Supervisor_ID
Supervisor_ID↠Employee_ID
Employee_ID↠Child
Employee_ID↠{Salary, Year}
Employee_ID→Spouse
{Employee_ID, Year}→Salary
Year↠Salary
Employee_ID→{Name, Address}

How can these dependencies help us design a schema?

First, extract the functional dependencies based upon a single attribute and see what they tell us:

Employee_ID→Supervisor_ID
Employee_ID→Spouse
Employee_ID→{Name, Address}

These three dependencies imply that Employee_ID could be the primary key in a table that also contains Spouse, Name, Address, and Supervisor_ID, since these attributes depend only upon a single Employee_ID. The {Name, Address} group can be split into two separate attributes since neither appear anywhere else in the dependencies:

«table»
Employees
Spouse
Name
Address
Supervisor_ID
«PK»
Employee_ID

Now, extract the multi-valued dependencies based upon a single attribute:

Supervisor_ID↠Employee_ID
Employee_ID↠Child
Employee_ID↠{Salary, Year}
Year↠Salary

The dependency Employee_ID↠Child implies that one employee may have many children. However, there is no dependency from Child to any other attribute. This leaves us with a table that relates only Employee_ID and Child. Since an Employee_ID may repeat, the tables primary key cannot consist only of the Employee_ID. Since there is only one other attribute in the table, i.e. Child, the primary key must consist of {Employee_ID, Child}:

«table»
Children
«PK»
Employee_ID
Child

The dependency Employee_ID↠{Salary, Year} implies that an employee's salary may vary from year to year. The dependency Year↠Salary implies that there may be many different salaries in any given year. Together these dependencies imply that any given employee may have only one salary in any given year. This then implies the existance of a table that contains these three attributes, and that the table's primary key must consist of {Employee_ID, Year}:

«table»
Salaries
Salary
«PK»
Employee_ID
Year

The only remaining relationship is Supervisor_ID↠Employee_ID, implying that one supervisor oversees many employees. We are informed that a supervisor is also an employee and has an employee ID, and is therefore also represented in the Employees table. This information is already stored in the Employees table through the functional dependency Employee_ID→Supervisor_ID.

These tables should then have their various foreign keys and relationships defined for their final schema design:

Schema Design

Here is some simple sample data that shows these dependencies:

employees
Employee_ID Name Address Spouse Supervisor_ID
1 Escobar, Dr. Hortenza Waterloo, ON NULL NULL
97 Brown, David Waterloo, ON Lori 1
64 Gartner, Manfred Kitchener, ON Elaine 1

Note that this table allows a NULL value for the Supervisor_ID since no employee can be their own supervisor, and employee 1 is the highest ranking employee and thus has no supervisor.

Children
Employee_ID Child
97 Tasmin
97 Tristan
97 Deborah

Salaries
Employee_ID Year Salary
1 2010 9,000.00
97 1986 400.00
97 1987 400.00
97 2010 2,300,000.00

These tables now meet the requirements outlined in all of the functional dependencies given, and unless there are other real-world constraints or rules not covered by the functional dependencies, this is an adequate database design.