A subquery is simply a query within a query. The inner query is executed first, then the outer query can use the results of the inner query in its processing. The inner query can be thought of as generating a temporary virtual table.
The use of subqueries allows for the creation of very complex and powerful selection statements.
Subqueries can generate temporary tables that can then have SELECT
statements applied to them.
SELECT s.studentId, surname, forename, courseId, repeats
FROM student AS s INNER JOIN
(SELECT studentId, courseId, COUNT(*) AS repeats
FROM studentCourseTerm
GROUP BY studentId, courseId)
AS courseRepeats
ON s.studentId = courseRepeats.studentId
WHERE repeats > 1
The subquery:
(SELECT studentId, courseId, COUNT(*) AS repeats
FROM studentCourseTerm
GROUP BY studentId, courseId)
AS courseRepeats
creates a temporary table that lists students and the number of times
they have taken any course. This temporary table is given an alias, courseRepeats,
so that it can be joined against the student table in
order to get the student names as well as the course name and count.
The WHERE
clause can be applied against any of the
resulting columns, and in this case is applied to the repeats
column.
The IN
clause is part of a WHERE
clause that
restricts the result of a SELECT
to tuples that contain
values in a list. NOT IN
restricts the result set to tuples
that do not have values in a list.
The IN
clause works either with a subquery that returns a
result set consisting of the values of a single column, or with a list
of comma-delimited literals.
SELECT memberSurname, memberForename, memberTitle
FROM member
WHERE memberId NOT IN
(SELECT pubMemberId FROM pub)
Here the subquery gets a list of all pubMemberIds
from pub, i.e. a list of members who have actually
published something. The outer query then compares all memberIds
in member to the contents of this list and returns
the names of those members who are not IN
this
list.
The opposite result (i.e. the names of members who have publications)
could be generated with the same subquery, but without the NOT
:
SELECT memberSurname, memberForename, memberTitle
FROM member
WHERE memberId IN
(SELECT pubMemberId FROM pub)
However, the same thing could be accomplished with a simple join:
SELECT memberSurname, memberForename, memberTitle
FROM member INNER JOIN pub
ON memberId = pubMemberId
Is the requirement to generate a negative result set (members who have not published anything) that makes the use of a subquery necessary. A join can provide positive results, not negative ones.
SELECT memberSurname, memberForename, memberTitle
FROM member
WHERE memberId in (9, 12, 35)
SELECT memberId
FROM member
WHERE memberSurname IN ('Kilgour', 'Brown')
The target of an IN
clause can be a list of literal
values. Generally, however, such a list is generated by an application
and passed as a parameter to a function or method that generates the
appropriate SQL code to be executed. Applying literals by hand as in
these examples are usually done through the command line or with an
application such as HeidiSQL.
An IN
clause with a literal list is not a
subquery - only if the IN
refers to the result of a SELECT
is it using a subquery.
The use of the EXISTS
clause is similar to the use of IN
,
except that the subquery need not restrict itself to returning a list of
values of a single attribute. So long as any result other than NULL
is returned by the subquery, the EXISTS
condition becomes
true. Like IN
, EXISTS
may be used with a NOT
.
SELECT memberSurname, memberForename, memberTitle
FROM member
WHERE NOT EXISTS
(SELECT * FROM pub WHERE pubMemberId = memberId)
This subquery returns all publications belonging to the member
selected in the outer SELECT
statement. Should the
subquery return a NULL
, the outer query returns a tuple
from the member table, i.e. it lists only members who
do not have publications. This is equivalent to the first example of
the use of the IN
clause.
SELECT DISTINCT memberSurname, memberForename, memberTitle
FROM member
INNER JOIN memberBroad
ON memberId = memberBroadMemberId
WHERE NOT EXISTS
(SELECT * FROM pub
WHERE memberId = pubMemberId)
AND memberBroadBroadId IN (8, 12)
This selects all members who do not have publications but do have expertise in "Strategic Studies" (8) or "Military History" (12).
The following selection statement produces the same result:
SELECT DISTINCT memberSurname, memberForename, memberTitle
FROM member
WHERE memberId NOT IN
(SELECT pubMemberId FROM pub)
AND EXISTS
(SELECT * FROM memberBroad
WHERE memberBroadBroadId IN (8, 12)
AND memberId = memberBroadMemberId)
The order in which the various clauses are defined, and which clauses
are defined by IN
or EXISTS
doesn't matter
in terms of getting the correct result, so long as the resulting logic
is the same. There may be significant differences in performance,
however, and we will examine optimizing selections later.
SELECT memberSurname, memberForename, memberTitle
FROM member
WHERE memberId IN
(SELECT memberBroadMemberId
FROM memberBroad
WHERE memberBroadBroadId = 12)
AND memberId NOT IN
(SELECT memberBroadMemberId
FROM memberBroad
WHERE memberBroadBroadId = 19)
This example lists all members who have expertise in "Military
History" (12), but who do not have expertise in "Military Alliances"
(19). Multiple subqueries can be chained with the use of AND
and OR
.
Subqueries can generate columns similar to how they generate a temporary table. The subquery must generate a single value rather than a table or list of values, as in this example:
SELECT studentId, surname, forename,
(SELECT COUNT(courseId)
FROM studentCourseTerm AS sct
WHERE s.studentId = sct.studentId) AS totalCourses
FROM student AS s
where the subquery:
(SELECT COUNT(courseId)
FROM studentCourseTerm AS sct
WHERE s.studentId = sct.studentId) AS totalCourses
counts how many courses an individual student has taken. This generates a single number for each student, and this number can be used, with its alias, as a column in the surrounding query.