Most DBMSs have tools that can help you determine the efficiency of your
data design and queries. MySQL, for example, provides the EXPLAIN
command to help you understand how your queries are executed. Using EXPLAIN
is very simple, just put it in front of any query. It is automatically
used in the SQL Query Practice page.
The following example shows the results of applying an EXPLAIN
to a SQL statement:
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadId = 12
which returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | broad | const | PRIMARY | PRIMARY | 1 | const | 1 | 100.00 |
The result consist of:
SELECT
identifier. Main queries are always numbered
1, the first subquery 2, and so on.
SELECT
type.
(For complete details of these fields, see: 8.8.2 EXPLAIN Output Format.)
These results tells us that the query:
(The other items can be ignored for now.)
Thus we can see that querying the broad
table for a
particular ID number is very efficient due to the table's primary key.
Imagine now that we have a database named dcris_bad
that has the same tables as dcris, but without
primary keys, foreign keys, or indices. The following EXPLAIN
:
/* dcris_bad */
EXPLAIN
SELECT * FROM broad WHERE broadId = 12
returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | broad | ALL | 19 | 10.00 | Using where |
This tells us that the query:
Clearly, the use of the primary key in the dcris query is significantly more efficient than a keyless query.
The EXPLAIN
:
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadDesc = 'Military History'
returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | broad | const | inBroadDesc | inBroadDesc | 262 | const | 1 | 100.00 | Using index |
This tells us that the query:
broadDesc
column (key:
index_broad_desc)
Issuing the same EXPLAIN
against the 'bad' dcris
database:
/* dcris_bad */
EXPLAIN
SELECT * FROM broad WHERE broadDesc = 'Military History'
returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | broad | ALL | 19 | 10.00 | Using where |
which is neither better nor worse than the previous statement on the 'bad' dcris database, and clearly less efficient than the indexed version.
LIKE
This SELECT
uses a string with a trailing
wildcard character '%'
in its LIKE
clause:
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadDesc LIKE 'Military History%'
returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | broad | index | inBroadDesc | inBroadDesc | 262 | 1 | 100.0 | Using where; Using index |
A filtered result of 100% tells us that the selection made use of the index on the description attribute.
This SELECT
uses a string with a both a beginning and
trailing wildcard character '%'
in its LIKE
clause:
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadDesc LIKE '%Military History%'
returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | broad | range | inBroadDesc | 262 | 19 | 11.11 | Using index |
Clearly looking for values with the equivalent of the Python startswith()
is more efficient that looking for values within the body of a string.
type: range indicates that only rows whose values are
within a certain range are chosen, and filtered: 11.11
tells us the the index was not much help. This would also apply to
comparisons that use the <>
, >
, >=
,
<
, <=
, IS NULL
, <=>
,
BETWEEN
, or IN()
operators.
(What would the results look like for '%Military History'
?
Try it.)
Testing either of these previous SELECT
s against the
'bad' dcris database returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | broad | ALL | 19 | 11.11 | Using where |
Clearly the use and placement of the '%'
wildcard
character doesn't matter if no indexes are defined.
Issuing the EXPLAIN
:
/* dcris */
EXPLAIN
SELECT * FROM member
INNER JOIN memberBroad ON (memberId = memberBroadMemberId)
INNER JOIN broad ON (broadId = memberBroadBroadId)
WHERE broadId = 12
returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | broad | const | PRIMARY | PRIMARY | 1 | const | 1 | 100.00 | ||
1 | SIMPLE | memberBroad | ref | PRIMARY,fkMemberBroadBroad | fkMemberBroadBroad | 1 | const | 21 | 100.00 | Using index | |
1 | SIMPLE | member | eq_ref | PRIMARY | PRIMARY | 2 | memberBroadMemberId | 1 | 100.00 |
telling us that:
Executing the same EXPLAIN
against the 'bad' dcris
database returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | broad | ALL | 19 | 10.00 | Using where | |||||
1 | SIMPLE | memberBroad | ALL | 307 | 10.00 | Using where; Using join buffer (hash join) | |||||
1 | SIMPLE | member | ALL | 82 | 10.00 | Using where; Using join buffer (hash join) |
telling us that the query has to process 478,306 records to extract the required data (rows = 19 × 307 × 82).
The EXPLAIN
:
/* dcris */
EXPLAIN
SELECT memberSurname, memberForename, COUNT(pubId) AS pubCount
FROM member INNER JOIN pub ON (memberId = pubMemberId)
GROUP BY memberSurname, memberForename
ORDER BY memberSurname, memberForename
returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | member | index | inFullName | inFullName | 205 | 82 | 100.00 | Using temporary; Using filesort | ||
1 | SIMPLE | pub | ref | inPubMember | inPubMember | 2 | dcris.member.memberId | 2 | 100.00 | Using index |
telling us that the query uses the foreign key definitions between member and pub to perform the join, and 164 records (rows = 82 × 2) rows are examined. Note that the rows count doesn't refer to the actual number of rows returned (which in this case would be 79, as three members have no publications), but rather an estimate of the number of records to be processed.
Issuing the same EXPLAIN
against the 'bad' dcris database
returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | member | ALL | 82 | 100.00 | Using temporary; Using filesort | |||||
1 | SIMPLE | pub | ALL | 229 | 100.00 | Using where; Using join buffer (hash join) |
telling us that the query examines 18,788 records (rows = 82 × 229).
/* dcris */
EXPLAIN
SELECT memberSurname, memberForename, COUNT(pubId) AS pubCount
FROM member LEFT OUTER JOIN pub ON (memberId = pubMemberId)
GROUP BY memberSurname, memberForename
ORDER BY memberSurname, memberForename
which returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | member | index | inFullName | inFullName | 205 | 82 | 100.00 | Using temporary; Using filesort | ||
1 | SIMPLE | pub | ref | inPubMember | inPubMember | 2 | member.memberId | 2 | 100.00 | Using index |
telling us that the use of INNER
versus OUTER
makes no difference in these particular queries.
Issuing the same statements against the 'bad' dcris database also
makes no difference in the EXPLAIN
results.
/* dcris */
EXPLAIN
SELECT memberSurname, memberForename, (SELECT COUNT(*) FROM pub
WHERE memberId = pubMemberId) AS pubCount
FROM member
ORDER BY memberSurname, memberForename
returns:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | member | index | inFullName | 205 | 82 | 100.00 | Using index | |||
2 | DEPENDENT SUBQUERY | pub | ref | inPubMember | inPubMember | 2 | memberId | 2 | 100.00 | Using index |
tells us that the query is using a subquery (select_type: DEPENDENT SUBQUERY), but the query is still making use of the foreign key relationship between member and pub.
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | m | ALL | 82 | 100.00 | Using filesort | |||||
2 | DEPENDENT SUBQUERY | p | ALL | 229 | 10.00 | Using where |
tells us that the subquery is recognized, but no efficiencies are derived from that.
The overall lesson is obvious: a well designed database with appropriate primary, unique, and foreign keys is significantly more efficient that a database without.