CP363 : Joins

Joins create new virtual (or composite) tables that include information from multiple tables. In SQL you specify which tables contain the information you need using the FROM clause. You must combine the correct attributes of each table. You control how attributes are matched in the composite table either by specifying a particular type of join operation or by using the ON clause.

In a relational database management system relationships among data values are left unstated in the database definition. They become explicit when you query the data, not when you create it. You can ask any question that comes to mind about the data stored in the database, regardless of what your intentions were when the database was set up. (The database organization may have some effect on how you frame such questions - certain questions may be easier or harder to state in SQL depending on the database organization). Joining different tables allows you to explore relationships among diverse data.

Joins normally appear within SELECT statements. The desired attributes appear after the SELECT, the tables and aliases appear in the FROM clause, and the joining conditions can be given in the ON clause. If two tables share attribute names, the attribute names must be qualified by the appropriate table name or alias, as in this example:

SELECT last_name, first_name, p_title
  FROM member AS m INNER JOIN pub AS p
  ON m.member_id = p.member_id

Here, member and pub are joined by their only common attribute, member_id. This join results in the listing of all members' names and the titles of their publications. members without publications will not be listed. (INNER has a special meaning that will be dealt with later.)

This query could be written without the JOIN ON clause by using a WHERE clause instead:

SELECT last_name, first_name, p_title
  FROM member AS m, pub AS p
  WHERE m.member_id = p.member_id

Although this statement is identical to the previous one using the JOIN ON clause, the ON keyword allows you to isolate the join constraints and can make your join statement easier to read. For example, consider:

SELECT last_name, first_name, p_title
  FROM member AS m INNER JOIN pub AS p
  ON m.member_id = p.member_id
  WHERE p.Date > 1995

versus

SELECT last_name, first_name, p_title
  FROM member AS m, pub AS p
  WHERE m.member_id = p.member_id
  AND p.Date > 1995

Although both statements produce the same result, the first explicitly indicates that two tables are being joined and that a condition is being applied to the result of that join. The second throws the join and the condition together as part of the WHERE clause.

Atttributes being joined must have the same or compatible data types. (In the previous example the member_id attributes of both tables are clearly of the same type because they have a primary key - foreign key relationship.) Data types that do not match can be converted to the same type using an appropriate SQL conversion function.


Types of Joins

Key Joins

Key joins take advantage of the foreign key relationship between two tables. Such queries are optimized by the DBMS engine. (Although query optimization is not a topic of this course, it is useful to know as a rule of thumb that joins made between tables related by primary and foreign keys will be more efficient, and thus execute faster, than joins made between tables that do not have such a relationship.) For example:

SELECT p_title, publisher
  FROM pub AS p INNER JOIN pub_type AS pt
  ON p.pub_type_id = pt.pub_type_id

This would work as both a natural join and a key join, but because the tables have a foreign key relationship based upon the attribute pub_type_id this query will execute faster than if the relationship had not been made explicit.


Self Joins

A table that has a reflexive association (i.e. is joined to itself) may be used in a self join. Assume an Employee table contains both an employee ID and a manager ID, where a manager is also part of the employee table. The employee table can be joined to itself only by given it two distinctive correlation names, as in this example:

SELECT Report.Emp_Name, Manager.Emp_name
  FROM Employee AS Report INNER JOIN Employee AS Manager
  ON Report.Manager_id = Manager.Emp_id

Report and Manager are simply two different aliases of the same table, Employee.


The condition for joining the values in two attributes does not need to be equality (=). You may use any of the other comparison operators: not equal (<>), greater than (>), less than (<), greater than or equal to (>=), and less than or equal to (<=). Use these conditions with caution, however. Take this example:

SELECT last_name, first_name, title
  FROM member AS m INNER JOIN pub AS p
  ON m.member_id >= p.member_id

This SQL statement returns a list of members and publications such that the member with member_id = 1 would be listed with all their own publications. But the member with member_id = 2 would be listed with all their publications as well as those of member 1. member 3 would be listed with their publications as well as those of members 1 and 2, and so on. It is not that such conditions are useless, it is just that they must be used with extreme caution.