Joins create new virtual (or composite) tables that include
information from multiple tables. In SQL you specify which tables contain the
information you need using the FROM
clause. You must combine the
correct attributes of each table. You control how attributes are matched in
the composite table either by specifying a particular type of join operation
or by using the ON
clause.
In a relational database management system relationships among data values are left unstated in the database definition. They become explicit when you query the data, not when you create it. You can ask any question that comes to mind about the data stored in the database, regardless of what your intentions were when the database was set up. (The database organization may have some effect on how you frame such questions - certain questions may be easier or harder to state in SQL depending on the database organization). Joining different tables allows you to explore relationships among diverse data.
Joins normally appear within SELECT
statements. The desired
attributes appear after the SELECT
, the tables and aliases
appear in the FROM
clause, and the joining conditions can be
given in the ON
clause. If two tables share attribute names, the
attribute names must be qualified by the appropriate table name or alias, as
in this example:
SELECT last_name, first_name, p_title FROM member AS m INNER JOIN pub AS p ON m.member_id = p.member_id
Here, member and pub are joined
by their only common attribute, member_id. This join results
in the listing of all members' names and the titles of their publications.
members without publications will not be listed. (INNER
has a
special meaning that will be dealt with later.)
This query could be written without the JOIN ON
clause by
using a WHERE
clause instead:
SELECT last_name, first_name, p_title FROM member AS m, pub AS p WHERE m.member_id = p.member_id
Although this statement is identical to the previous one using the
JOIN ON
clause, the ON
keyword allows you to
isolate the join constraints and can make your join statement easier to read.
For example, consider:
SELECT last_name, first_name, p_title FROM member AS m INNER JOIN pub AS p ON m.member_id = p.member_id WHERE p.Date > 1995
versus
SELECT last_name, first_name, p_title FROM member AS m, pub AS p WHERE m.member_id = p.member_id AND p.Date > 1995
Although both statements produce the same result, the first explicitly
indicates that two tables are being joined and that a condition is being
applied to the result of that join. The second throws the join and the
condition together as part of the WHERE
clause.
Atttributes being joined must have the same or compatible data types. (In the previous example the member_id attributes of both tables are clearly of the same type because they have a primary key - foreign key relationship.) Data types that do not match can be converted to the same type using an appropriate SQL conversion function.
Key Joins
Key joins take advantage of the foreign key relationship between two tables. Such queries are optimized by the DBMS engine. (Although query optimization is not a topic of this course, it is useful to know as a rule of thumb that joins made between tables related by primary and foreign keys will be more efficient, and thus execute faster, than joins made between tables that do not have such a relationship.) For example:
SELECT p_title, publisher FROM pub AS p INNER JOIN pub_type AS pt ON p.pub_type_id = pt.pub_type_id
This would work as both a natural join and a key join, but because the tables have a foreign key relationship based upon the attribute pub_type_id this query will execute faster than if the relationship had not been made explicit.
Self Joins
A table that has a reflexive association (i.e. is joined to itself) may be used in a self join. Assume an Employee table contains both an employee ID and a manager ID, where a manager is also part of the employee table. The employee table can be joined to itself only by given it two distinctive correlation names, as in this example:
SELECT Report.Emp_Name, Manager.Emp_name FROM Employee AS Report INNER JOIN Employee AS Manager ON Report.Manager_id = Manager.Emp_id
Report and Manager are simply two different aliases of the same table, Employee.
The condition for joining the values in two attributes does not need to be
equality (=
). You may use any of the other comparison operators:
not equal (<>
), greater than (>
), less
than (<
), greater than or equal to (>=
), and
less than or equal to (<=
). Use these conditions with
caution, however. Take this example:
SELECT last_name, first_name, title FROM member AS m INNER JOIN pub AS p ON m.member_id >= p.member_id
This SQL statement returns a list of members and publications such that the member with member_id = 1 would be listed with all their own publications. But the member with member_id = 2 would be listed with all their publications as well as those of member 1. member 3 would be listed with their publications as well as those of members 1 and 2, and so on. It is not that such conditions are useless, it is just that they must be used with extreme caution.