On an inner join tuples appear in the result set only if both tables contain at least one row that satisfies the join condition.
The following tables illustrate the results of an inner join:
Student table :
Student_ID | Last_Name | First_Name |
---|---|---|
815686230 | Brown | David |
999568440 | Snord | Cranston |
987859400 | Zzap | Zachary |
Student_Course table :
Student_ID | Course |
---|---|
999568440 | CP363 |
999568440 | CP102 |
987859400 | CP363 |
987859400 | HP202 |
Assuming that these two tables have a foreign key relationship based upon
the Student_ID attribute, the following SELECT
statement joins the two tables:
SELECT Last_Name, First_Name, s.Student_ID, Course FROM Student AS s INNER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID
produces the following result set:
Last_Name | First_Name | Student_ID | Course |
---|---|---|---|
Snord | Cranston | 999568440 | CP363 |
Snord | Cranston | 999568440 | CP102 |
Zzap | Zachary | 987859400 | CP363 |
Zzap | Zachary | 987859400 | HP202 |
Note that none of the data from the Student table
pertaining to David Brown appears in this result set. The
Student_ID attribute is common to both the
Student and Student_Course table, but David
Brown's ID 815686230 does not appear anywhere in the
Student_Course table. Because on an inner join tuples appear
in the result set only if both tables contain at least one row that satisfies
the join condition, there can be no join based on the attribute value
815686230
.
The same principle applies across more than two tables. In a multi-table join, all tables must have at least one tuple that satisfies the result condition in order for a matching tuple to appear in the result set. If a third table is added to the previous join:
Course_Lab table :
Course | Lab |
---|---|
CP102 | Tu, Th 4:00 - 5:00 |
with the join:
SELECT Last_Name, First_Name, s.Student_ID, cl.Course, Lab FROM Student AS s INNER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID INNER JOIN Course_Lab AS cl ON sc.Course = cl.Course
The result would be:
Last_Name | First_Name | Student_ID | Course | Lab |
---|---|---|---|---|
Snord | Cranston | 999568440 | CP102 | Tu, Th 4:00 - 5:00 |
There is only one tuple in the final result set because only
CP102
has tuples in all three tables, i.e. both a student and a
lab.
An outer join contains tuples whether or not a tuple exists in
the opposite table to satisfy the join condition. The LEFT
and
RIGHT
keywords identify the table that is to appear in its
entirety. A LEFT OUTER JOIN
contains every tuple in the
left-hand table, while a RIGHT OUTER JOIN
contains
every tuple in the right-hand table. A FULL OUTER JOIN
contains every tuple in both the left- and right-hand tables. Attributes in
the query result set that do not have values from tuples in the left or right
tables contain NULL
. Using the previous examples, altered to be
an outer join:
SELECT Last_Name, First_Name, s.Student_ID, Course FROM Student AS s LEFT OUTER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID
produces the following result set:
Last_Name | First_Name | Student_ID | Course |
---|---|---|---|
Brown | David | 815686230 | NULL |
Snord | Cranston | 999568440 | CP363 |
Snord | Cranston | 999568440 | CP102 |
Zzap | Zachary | 987859400 | CP363 |
Zzap | Zachary | 987859400 | HP202 |
David Brown now appears in the result set, but the matching
Course attribute is NULL
, as there were no
matching values in the Student_Course table.
If the LEFT OUTER JOIN
phrase was replaced by a RIGHT
OUTER JOIN
phrase, the result set would be identical to the result set
for the inner join example - the table Student_Course does
not contain any values in the Student_ID attribute that do
not have matching values in the Student table. Thus, none of
the attributes in the result set will have a NULL
value.
Outer joins can be especially useful when grouping and counting occurrences of values, but its use poses subtle dangers. Assume you wish to find out how many courses each student is taking. The inner join in this query:
SELECT Last_Name, First_Name, COUNT(*) AS Course_Count FROM Student AS s INNER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID GROUP BY Last_Name, First_Name ORDER BY Last_Name, First_Name
produces the following result set:
Last_Name | First_Name | Course_Count |
---|---|---|
Snord | Cranston | 2 |
Zzap | Zachary | 2 |
Unfortunately, David Brown is left out entirely from this result set. This is expected, because as described in the inner join example there is no matching tuple for David Brown in the Student_Course table. In order to include David Brown in the result set you must use an outer join. The query:
SELECT Last_Name, First_Name, COUNT(*) AS Course_Count FROM Student AS s LEFT OUTER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID GROUP BY Last_Name, First_Name ORDER BY Last_Name, First_Name
produces the result set:
Last_Name | First_Name | Course_Count |
---|---|---|
Brown | David | 1 |
Snord | Cranston | 2 |
Zzap | Zachary | 2 |
This is not quite what was desired, as it implies that David Brown is
taking one course when he is actually taking none. The problem lies not in
the OUTER JOIN
phrase, but in the COUNT(*)
phrase.
The COUNT()
function counts all non-NULL
values in
a result set. Using '*
' as its parameter is simply asking
COUNT()
to return the number of tuples that contain any
non-NULL attributes. Since the Last_Name and
First_Name attributes are non-NULL
, the
resulting single tuple for David Brown provides a count of 1.
In order to produce the desired result the first line of the query must be replaced by:
SELECT Last_Name, First_Name, COUNT( Course ) as Course_Count ...
producing the result set:
Last_Name | First_Name | Course_Count |
---|---|---|
Brown | David | 0 |
Snord | Cranston | 2 |
Zzap | Zachary | 2 |
The phrase COUNT(Course)
asks the COUNT()
function to return the number of non-NULL
occurrences of the
attribute Course in its result set. The first outer join
example showed that the only resulting tuple for David Brown contains a
NULL
value for Course. Thus
COUNT(Course)
returns 0 for this tuple.
You may join more than two tables in a SELECT
statement and
mix different types of joins in that SELECT
statement. These
joined tables are normally listed sequentially, and can be thought of as
going from left to right in the order they are listed, in the same way that
outer join statements are considered to be either left or right. (See the
example of the Course_Lab table under Inner
Joins.)
In addition to the Student and
Student_Course
tables of the previous examples, assume that a
Course table also exists:
Course table :
Course | Name |
---|---|
CP363 | Database I |
CP102 | Intro to Microcomputers |
HP202 | Science in the Modern World |
BW101 | Intro to Basket Weaving |
In order to generate a list of student names and the names of the courses they are taking, the following query suffices:
SELECT Last_Name, First_Name, Name FROM Student AS s INNER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID INNER JOIN Course AS c ON sc.Course = c.Course ORDER BY Last_Name, First_Name, Name
which produces the following result set:
Last_Name | First_Name | Name |
---|---|---|
Snord | Cranston | Database I |
Snord | Cranston | Intro to Microcomputers |
Zzap | Zachary | Database I |
Zzap | Zachary | Science in the Modern World |
(In this example the table Student_Course acts as a look-up table between Student and Course.
You can mix two different outer joins, but the query design needs careful thought:
SELECT Last_Name, First_Name, Name FROM Student AS s LEFT OUTER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID RIGHT OUTER JOIN Course AS c ON sc.Course = c.Course ORDER BY Last_Name, First_Name, Name
produces the result set:
Last_Name | First_Name | Name |
---|---|---|
NULL | NULL | Intro to Basket Weaving |
Snord | Cranston | Database I |
Snord | Cranston | Intro to Microcomputers |
Zzap | Zachary | Database I |
Zzap | Zachary | Science in the Modern World |
Note that David Brown is still missing. What is important here is the order of the joins. When Student and Student_Course are related by a left outer join, David Brown appears. When the right outer join between the results of the previous left outer join and Course is performed, the tuple containing David Brown is dropped, as it is part of the left 'table' (the left result set) and not the right table. To get the tuple containing David Brown without any course information, it is necessary to rewrite the previous example (note the parentheses - they change the order in which the joins are executed):
SELECT Last_Name, First_Name, Name FROM Student AS s LEFT OUTER JOIN ( Student_Course AS sc ON s.Student_ID = sc.Student_ID RIGHT OUTER JOIN Course AS c ) ON sc.Course = c.Course ORDER BY Last_Name, First_Name, Name
produces the result set:
Last_Name | First_Name | Name |
---|---|---|
Brown | David | NULL |
Snord | Cranston | Database I |
Snord | Cranston | Intro to Microcomputers |
Zzap | Zachary | Database I |
Zzap | Zachary | Science in the Modern World |
The same result could have been obtained by starting with the Course table and moving 'left' to the Student table.
Note, however, that the tuple containing Intro to Basket
Weaving
is now missing, for the same reasons (reversed) that
David Brown
was missing in the previous result set. In order to
get both David Brown
and Intro to Basket Weaving
into the final result set needs a full outer join:
SELECT Last_Name, First_Name, Name FROM Student AS s FULL OUTER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID FULL OUTER JOIN Course AS c ON sc.Course = c.Course ORDER BY Last_Name, First_Name, Name
This produces:
Last_Name | First_Name | Name |
---|---|---|
Brown | David | NULL |
Snord | Cranston | Database I |
Snord | Cranston | Intro to Microcomputers |
Zzap | Zachary | Database I |
Zzap | Zachary | Science in the Modern World |
NULL | NULL | Intro to Basket Weaving |
One last note of caution about outer joins - the attributes you choose for projection are extremely important. The following query:
SELECT sc.Student_ID, sc.Name FROM Student AS s LEFT OUTER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID RIGHT OUTER JOIN Course AS c ON sc.Name = c.Name
produces the result set:
Student_ID | Name |
---|---|
NULL | NULL |
999568440 | CP363 |
999568440 | CP102 |
987859400 | CP363 |
987859400 | HP202 |
This differs from the query:
SELECT s.Student_ID, c.Name FROM Student AS s LEFT OUTER JOIN Student_Course AS sc ON s.Student_ID = sc.Student_ID RIGHT OUTER JOIN Course AS c ON sc.Name = c.Name
which produces the result set:
Student_ID | Name |
---|---|
NULL | BW101 |
999568440 | CP363 |
999568440 | CP102 |
987859400 | CP363 |
987859400 | HP202 |
The first query chooses its attributes from the look-up table
Student_Course (as sc), which has
NULL
values for some of the tuples in the outer joins. The
second query uses the attributes of the tables on the 'outside' of the joins,
and at least one of these attributes has values in the result set.
A Star Join is a complex join where a series of tables center around a single table. A star join cannot be created as a series of sequential joins as in the previous examples. Imagine the following table relationships:
The table Student_Course_Term is at the center of a star relationship amongst the tables Student, Course, and Term. It is a look-up table for these three other tables, and contains information about the courses a given student takes during any term. Assume that there are appropriate foreign key relationships between Student_Course_Term and the other three tables.
In order to generate a list of courses taken by all students during all terms, the query looks like this:
SELECT s.Last_Name, s.First_Name, c.Name, t.Name FROM ( ( Student_Course_Term AS sct INNER JOIN Student AS s ON sct.Student_ID = s.Student_ID ) INNER JOIN Course AS c ON sct.Name = c.Name ) INNER JOIN Term AS t ON sct.Term_Code = t.Term_Code ORDER BY s.Last_Name, s.First_Name, c.Name, t.Name
Note the use of parentheses to group the joins together and the fact that the middle table (Student_Course_Term) is referenced only once.
Star joins may use both inner joins and outer joins.
Joins can be performed against virtual, or derived tables, as in the following example:
SELECT s.Last_Name, s.First_Name, Course, Repeats FROM Student AS s INNER JOIN ( SELECT Student_ID, Course, COUNT(*) FROM Student_Course GROUP BY Student_ID, Course ) AS Course_Repeats ( Student_ID, Course, Repeats ) ON s.Student_ID = Course_Repeats.Student_ID WHERE Repeats > 1
This query returns a list of all students who have taken a course more than once. The subquery (the query in parentheses) creates a virtual table Course_Repeats from Student_Course that is joined against Student based upon their common Student_ID values. The subquery attributes (in this case Course and Repeats) can be displayed as part of the result set.
Describing the details of how various SQL engines implement joins is beyond the scope of this course. It is worth noting, however, that providing indices on the columns to be joined can aid performance immensely. In the previous examples creating individual indices on the attributes of the various look-up tables would help performance. (ex: on Student_ID and Course in Student_Course. These attributes, as primary keys in their 'owning' tables, are already indexed).