CP363 : Joins (cont'd)

Inner Joins

On an inner join tuples appear in the result set only if both tables contain at least one row that satisfies the join condition.

The following tables illustrate the results of an inner join:

Student table :

Student_ID Last_Name First_Name
815686230 Brown David
999568440 Snord Cranston
987859400 Zzap Zachary

Student_Course table :

Student_ID Course
999568440 CP363
999568440 CP102
987859400 CP363
987859400 HP202

Assuming that these two tables have a foreign key relationship based upon the Student_ID attribute, the following SELECT statement joins the two tables:

SELECT Last_Name, First_Name, s.Student_ID, Course
  FROM Student AS s INNER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID

produces the following result set:

Last_Name First_Name Student_ID Course
Snord Cranston 999568440 CP363
Snord Cranston 999568440 CP102
Zzap Zachary 987859400 CP363
Zzap Zachary 987859400 HP202

Note that none of the data from the Student table pertaining to David Brown appears in this result set. The Student_ID attribute is common to both the Student and Student_Course table, but David Brown's ID 815686230 does not appear anywhere in the Student_Course table. Because on an inner join tuples appear in the result set only if both tables contain at least one row that satisfies the join condition, there can be no join based on the attribute value 815686230.

The same principle applies across more than two tables. In a multi-table join, all tables must have at least one tuple that satisfies the result condition in order for a matching tuple to appear in the result set. If a third table is added to the previous join:

Course_Lab table :

Course Lab
CP102 Tu, Th 4:00 - 5:00

with the join:

SELECT Last_Name, First_Name, s.Student_ID, cl.Course, Lab
  FROM Student AS s
  INNER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID
  INNER JOIN Course_Lab AS cl
  ON sc.Course = cl.Course

The result would be:

Last_Name First_Name Student_ID Course Lab
Snord Cranston 999568440 CP102 Tu, Th 4:00 - 5:00

There is only one tuple in the final result set because only CP102 has tuples in all three tables, i.e. both a student and a lab.


Outer Joins

An outer join contains tuples whether or not a tuple exists in the opposite table to satisfy the join condition. The LEFT and RIGHT keywords identify the table that is to appear in its entirety. A LEFT OUTER JOIN contains every tuple in the left-hand table, while a RIGHT OUTER JOIN contains every tuple in the right-hand table. A FULL OUTER JOIN contains every tuple in both the left- and right-hand tables. Attributes in the query result set that do not have values from tuples in the left or right tables contain NULL. Using the previous examples, altered to be an outer join:

SELECT Last_Name, First_Name, s.Student_ID, Course
  FROM Student AS s LEFT OUTER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID

produces the following result set:

Last_Name First_Name Student_ID Course
Brown David 815686230 NULL
Snord Cranston 999568440 CP363
Snord Cranston 999568440 CP102
Zzap Zachary 987859400 CP363
Zzap Zachary 987859400 HP202

David Brown now appears in the result set, but the matching Course attribute is NULL, as there were no matching values in the Student_Course table.

If the LEFT OUTER JOIN phrase was replaced by a RIGHT OUTER JOIN phrase, the result set would be identical to the result set for the inner join example - the table Student_Course does not contain any values in the Student_ID attribute that do not have matching values in the Student table. Thus, none of the attributes in the result set will have a NULL value.

Outer joins can be especially useful when grouping and counting occurrences of values, but its use poses subtle dangers. Assume you wish to find out how many courses each student is taking. The inner join in this query:

SELECT Last_Name, First_Name, COUNT(*) AS Course_Count
  FROM Student AS s INNER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID
  GROUP BY Last_Name, First_Name
  ORDER BY Last_Name, First_Name

produces the following result set:

Last_Name First_Name Course_Count
Snord Cranston 2
Zzap Zachary 2

Unfortunately, David Brown is left out entirely from this result set. This is expected, because as described in the inner join example there is no matching tuple for David Brown in the Student_Course table. In order to include David Brown in the result set you must use an outer join. The query:

SELECT Last_Name, First_Name, COUNT(*) AS Course_Count
  FROM Student AS s LEFT OUTER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID
  GROUP BY Last_Name, First_Name
  ORDER BY Last_Name, First_Name

produces the result set:

Last_Name First_Name Course_Count
Brown David 1
Snord Cranston 2
Zzap Zachary 2

This is not quite what was desired, as it implies that David Brown is taking one course when he is actually taking none. The problem lies not in the OUTER JOIN phrase, but in the COUNT(*) phrase. The COUNT() function counts all non-NULL values in a result set. Using '*' as its parameter is simply asking COUNT() to return the number of tuples that contain any non-NULL attributes. Since the Last_Name and First_Name attributes are non-NULL, the resulting single tuple for David Brown provides a count of 1.

In order to produce the desired result the first line of the query must be replaced by:

SELECT Last_Name, First_Name, COUNT( Course ) as Course_Count
  ...

producing the result set:

Last_Name First_Name Course_Count
Brown David 0
Snord Cranston 2
Zzap Zachary 2

The phrase COUNT(Course) asks the COUNT() function to return the number of non-NULL occurrences of the attribute Course in its result set. The first outer join example showed that the only resulting tuple for David Brown contains a NULL value for Course. Thus COUNT(Course) returns 0 for this tuple.


Multiple Table Joins

You may join more than two tables in a SELECT statement and mix different types of joins in that SELECT statement. These joined tables are normally listed sequentially, and can be thought of as going from left to right in the order they are listed, in the same way that outer join statements are considered to be either left or right. (See the example of the Course_Lab table under Inner Joins.)

In addition to the Student and Student_Course tables of the previous examples, assume that a Course table also exists:

Course table :

Course Name
CP363 Database I
CP102 Intro to Microcomputers
HP202 Science in the Modern World
BW101 Intro to Basket Weaving

In order to generate a list of student names and the names of the courses they are taking, the following query suffices:

SELECT Last_Name, First_Name, Name
  FROM Student AS s
  INNER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID
  INNER JOIN Course AS c
  ON sc.Course = c.Course
  ORDER BY Last_Name, First_Name, Name

which produces the following result set:

Last_Name First_Name Name
Snord Cranston Database I
Snord Cranston Intro to Microcomputers
Zzap Zachary Database I
Zzap Zachary Science in the Modern World

(In this example the table Student_Course acts as a look-up table between Student and Course.

You can mix two different outer joins, but the query design needs careful thought:

SELECT Last_Name, First_Name, Name
  FROM Student AS s
  LEFT OUTER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID
  RIGHT OUTER JOIN Course AS c
  ON sc.Course = c.Course
  ORDER BY Last_Name, First_Name, Name

produces the result set:

Last_Name First_Name Name
NULL NULL Intro to Basket Weaving
Snord Cranston Database I
Snord Cranston Intro to Microcomputers
Zzap Zachary Database I
Zzap Zachary Science in the Modern World

Note that David Brown is still missing. What is important here is the order of the joins. When Student and Student_Course are related by a left outer join, David Brown appears. When the right outer join between the results of the previous left outer join and Course is performed, the tuple containing David Brown is dropped, as it is part of the left 'table' (the left result set) and not the right table. To get the tuple containing David Brown without any course information, it is necessary to rewrite the previous example (note the parentheses - they change the order in which the joins are executed):

SELECT Last_Name, First_Name, Name
  FROM Student AS s
  LEFT OUTER JOIN ( Student_Course AS sc
  ON s.Student_ID = sc.Student_ID
  RIGHT OUTER JOIN Course AS c )
  ON sc.Course = c.Course
  ORDER BY Last_Name, First_Name, Name

produces the result set:

Last_Name First_Name Name
Brown David NULL
Snord Cranston Database I
Snord Cranston Intro to Microcomputers
Zzap Zachary Database I
Zzap Zachary Science in the Modern World

The same result could have been obtained by starting with the Course table and moving 'left' to the Student table.

Note, however, that the tuple containing Intro to Basket Weaving is now missing, for the same reasons (reversed) that David Brown was missing in the previous result set. In order to get both David Brown and Intro to Basket Weaving into the final result set needs a full outer join:

SELECT Last_Name, First_Name, Name
  FROM Student AS s
  FULL OUTER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID
  FULL OUTER JOIN Course AS c
  ON sc.Course = c.Course
  ORDER BY Last_Name, First_Name, Name

This produces:

Last_Name First_Name Name
Brown David NULL
Snord Cranston Database I
Snord Cranston Intro to Microcomputers
Zzap Zachary Database I
Zzap Zachary Science in the Modern World
NULL NULL Intro to Basket Weaving

One last note of caution about outer joins - the attributes you choose for projection are extremely important. The following query:

SELECT sc.Student_ID, sc.Name
  FROM Student AS s
  LEFT OUTER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID
  RIGHT OUTER JOIN Course AS c
  ON sc.Name = c.Name

produces the result set:

Student_ID Name
NULL NULL
999568440 CP363
999568440 CP102
987859400 CP363
987859400 HP202

This differs from the query:

SELECT s.Student_ID, c.Name
  FROM Student AS s
  LEFT OUTER JOIN Student_Course AS sc
  ON s.Student_ID = sc.Student_ID
  RIGHT OUTER JOIN Course AS c
  ON sc.Name = c.Name

which produces the result set:

Student_ID Name
NULL BW101
999568440 CP363
999568440 CP102
987859400 CP363
987859400 HP202

The first query chooses its attributes from the look-up table Student_Course (as sc), which has NULL values for some of the tuples in the outer joins. The second query uses the attributes of the tables on the 'outside' of the joins, and at least one of these attributes has values in the result set.


Star Joins

A Star Join is a complex join where a series of tables center around a single table. A star join cannot be created as a series of sequential joins as in the previous examples. Imagine the following table relationships:

The table Student_Course_Term is at the center of a star relationship amongst the tables Student, Course, and Term. It is a look-up table for these three other tables, and contains information about the courses a given student takes during any term. Assume that there are appropriate foreign key relationships between Student_Course_Term and the other three tables.

In order to generate a list of courses taken by all students during all terms, the query looks like this:

SELECT s.Last_Name, s.First_Name, c.Name, t.Name
  FROM ( ( Student_Course_Term AS sct
  INNER JOIN Student AS s
  ON sct.Student_ID = s.Student_ID )
  INNER JOIN Course AS c
  ON sct.Name = c.Name )
  INNER JOIN Term AS t
  ON sct.Term_Code = t.Term_Code
  ORDER BY s.Last_Name, s.First_Name, c.Name, t.Name

Note the use of parentheses to group the joins together and the fact that the middle table (Student_Course_Term) is referenced only once.

Star joins may use both inner joins and outer joins.


Virtual Table Joins

Joins can be performed against virtual, or derived tables, as in the following example:

SELECT s.Last_Name, s.First_Name, Course, Repeats
  FROM Student AS s
  INNER JOIN
  ( SELECT Student_ID, Course, COUNT(*)
  FROM Student_Course
  GROUP BY Student_ID, Course )
  AS Course_Repeats ( Student_ID, Course, Repeats )
  ON s.Student_ID = Course_Repeats.Student_ID
  WHERE Repeats > 1

This query returns a list of all students who have taken a course more than once. The subquery (the query in parentheses) creates a virtual table Course_Repeats from Student_Course that is joined against Student based upon their common Student_ID values. The subquery attributes (in this case Course and Repeats) can be displayed as part of the result set.


Join Performance

Describing the details of how various SQL engines implement joins is beyond the scope of this course. It is worth noting, however, that providing indices on the columns to be joined can aid performance immensely. In the previous examples creating individual indices on the attributes of the various look-up tables would help performance. (ex: on Student_ID and Course in Student_Course. These attributes, as primary keys in their 'owning' tables, are already indexed).