SQL Subqueries

Introduction

A subquery is simply a query within a query. The inner query is executed first, then the outer query can use the results of the inner query in its processing. The inner query can be thought of as generating a temporary virtual table.

The use of subqueries allows for the creation of very complex and powerful selection statements.

Temporary Tables

Subqueries can generate temporary tables that can then have SELECT statements applied to them.

List Students with Repeated courses
          
SELECT s.studentId, surname, forename, courseId, repeats
FROM student AS s INNER JOIN
  (SELECT studentId, courseId, COUNT(*) AS repeats
  FROM studentCourseTerm
  GROUP BY studentId, courseId)
  AS courseRepeats
ON s.studentId = courseRepeats.studentId
WHERE repeats > 1
          
          

The subquery:

          
  (SELECT studentId, courseId, COUNT(*) AS repeats
  FROM studentCourseTerm
  GROUP BY studentId, courseId)
  AS courseRepeats
 
        

creates a temporary table that lists students and the number of times they have taken any course. This temporary table is given an alias, courseRepeats, so that it can be joined against the student table in order to get the student names as well as the course name and count. The WHERE clause can be applied against any of the resulting columns, and in this case is applied to the repeats column.

IN

The IN clause is part of a WHERE clause that restricts the result of a SELECT to tuples that contain values in a list. NOT IN restricts the result set to tuples that do not have values in a list.

The IN clause works either with a subquery that returns a result set consisting of the values of a single column, or with a list of comma-delimited literals.

List Members with No Publications
          
SELECT memberSurname, memberForename, memberTitle
  FROM member
  WHERE memberId NOT IN
    (SELECT pubMemberId FROM pub)
          
          

Here the subquery gets a list of all pubMemberIds from pub, i.e. a list of members who have actually published something. The outer query then compares all memberIds in member to the contents of this list and returns the names of those members who are not IN this list.

The opposite result (i.e. the names of members who have publications) could be generated with the same subquery, but without the NOT:

          
SELECT memberSurname, memberForename, memberTitle
  FROM member
  WHERE memberId IN
    (SELECT pubMemberId FROM pub)

However, the same thing could be accomplished with a simple join:

          
SELECT memberSurname, memberForename, memberTitle
  FROM member INNER JOIN pub
  ON memberId = pubMemberId

        

Is the requirement to generate a negative result set (members who have not published anything) that makes the use of a subquery necessary. A join can provide positive results, not negative ones.

List Members by ID or Name
          
SELECT memberSurname, memberForename, memberTitle
  FROM member
  WHERE memberId in (9, 12, 35)
          
          
          
SELECT memberId
  FROM member
  WHERE memberSurname IN ('Kilgour', 'Brown')
          
          

The target of an IN clause can be a list of literal values. Generally, however, such a list is generated by an application and passed as a parameter to a function or method that generates the appropriate SQL code to be executed. Applying literals by hand as in these examples are usually done through the command line or with an application such as HeidiSQL.

An IN clause with a literal list is not a subquery - only if the IN refers to the result of a SELECT is it using a subquery.

EXISTS

The use of the EXISTS clause is similar to the use of IN, except that the subquery need not restrict itself to returning a list of values of a single attribute. So long as any result other than NULL is returned by the subquery, the EXISTS condition becomes true. Like IN, EXISTS may be used with a NOT.

List Members Without Publications
          
SELECT memberSurname, memberForename, memberTitle
  FROM member
  WHERE NOT EXISTS
    (SELECT * FROM pub WHERE pubMemberId = memberId)
          
          

This subquery returns all publications belonging to the member selected in the outer SELECT statement. Should the subquery return a NULL, the outer query returns a tuple from the member table, i.e. it lists only members who do not have publications. This is equivalent to the first example of the use of the IN clause.

List Non-Publishing Members by Expertise
          
SELECT DISTINCT memberSurname, memberForename, memberTitle
  FROM member
  INNER JOIN memberBroad
  ON memberId = memberBroadMemberId
  WHERE NOT EXISTS
    (SELECT * FROM pub
      WHERE memberId = pubMemberId)
  AND memberBroadBroadId IN (8, 12)
          
          

This selects all members who do not have publications but do have expertise in "Strategic Studies" (8) or "Military History" (12).

The following selection statement produces the same result:

          
SELECT DISTINCT memberSurname, memberForename, memberTitle
  FROM member
  WHERE memberId NOT IN
    (SELECT pubMemberId FROM pub)
  AND EXISTS
    (SELECT * FROM memberBroad
      WHERE memberBroadBroadId IN (8, 12)
      AND memberId = memberBroadMemberId)
  
        

The order in which the various clauses are defined, and which clauses are defined by IN or EXISTS doesn't matter in terms of getting the correct result, so long as the resulting logic is the same. There may be significant differences in performance, however, and we will examine optimizing selections later.

List Members by Expertise and Non-Expertise
          
SELECT memberSurname, memberForename, memberTitle
  FROM member
  WHERE memberId IN
    (SELECT memberBroadMemberId
      FROM memberBroad
      WHERE memberBroadBroadId = 12)
  AND memberId NOT IN
    (SELECT memberBroadMemberId
      FROM memberBroad
      WHERE memberBroadBroadId = 19)
          
          

This example lists all members who have expertise in "Military History" (12), but who do not have expertise in "Military Alliances" (19). Multiple subqueries can be chained with the use of AND and OR.

Generated Columns

Subqueries can generate columns similar to how they generate a temporary table. The subquery must generate a single value rather than a table or list of values, as in this example:

Create a totalCourses Column
          
SELECT studentId, surname, forename,
  (SELECT COUNT(courseId)
  FROM studentCourseTerm AS sct
  WHERE s.studentId = sct.studentId) AS totalCourses
FROM student AS s
          
          

where the subquery:

          
  (SELECT COUNT(courseId)
  FROM studentCourseTerm AS sct
  WHERE s.studentId = sct.studentId) AS totalCourses
          
        

counts how many courses an individual student has taken. This generates a single number for each student, and this number can be used, with its alias, as a column in the surrounding query.