Lesson 15: Query Optimization

Introduction

Most DBMSs have tools that can help you determine the efficiency of your data design and queries. MySQL, for example, provides the EXPLAIN command to help you understand how your queries are executed. Using EXPLAIN is very simple, just put it in front of any query. It is automatically used in the SQL Query Practice page.

The following example shows the results of applying an EXPLAIN to a SQL statement:

Using EXPLAIN
      
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadId = 12

    

which returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE broad const PRIMARY PRIMARY 1 const 1 100.00

The result consist of:

id
The SELECT identifier. Main queries are always numbered 1, the first subquery 2, and so on.
select_type
The SELECT type.
table
The table queried.
partitions
The matching partitions
type
The type of join used
possible_keys
The indexes to choose from
key
The index actually used
key_len
Length of the key chosen
ref
The indexed columns
rows
Estimate of the number of rows to be examined
filtered
Percentage of result rows filtered by the condition. 100% is best.
Extra
Additional information

(For complete details of these fields, see: 8.8.2 EXPLAIN Output Format.)

These results tells us that the query:

  • did not use any unions or subqueries (select_type: SIMPLE)
  • join (although there was no actual join) produced only one row (type: const)
  • used the table's primary key (key: PRIMARY)
  • is estimated to have to search through only one row to find the requested data (rows: 1)
  • 100% of the query was filtered

(The other items can be ignored for now.)

Thus we can see that querying the broad table for a particular ID number is very efficient due to the table's primary key.

EXPLAIN With No Keys

Imagine now that we have a database named dcris_bad that has the same tables as dcris, but without primary keys, foreign keys, or indices. The following EXPLAIN:

      
/* dcris_bad */
EXPLAIN
SELECT * FROM broad WHERE broadId = 12

    

returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE broad ALL 19 10.00 Using where

This tells us that the query:

  • did not use any unions or subqueries (select_type: SIMPLE)
  • join forces a scan of the entire table (type: ALL)
  • does not use any keys (key is empty)
  • is estimated to have to search through 19 rows to find the requested data (rows: 19)
  • Only 10% of the rows were filtered

Clearly, the use of the primary key in the dcris query is significantly more efficient than a keyless query.

EXPLAIN On a Simple Projection

The EXPLAIN:

      
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadDesc = 'Military History'

    

returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE broad const inBroadDesc inBroadDesc 262 const 1 100.00 Using index

This tells us that the query:

  • did not use any unions or subqueries (select_type: SIMPLE)
  • join (although there was no actual join) produced only one row (type: const)
  • used the table's unique key on the broadDesc column (key: index_broad_desc)
  • is estimated to have to search through only one row to find the requested data (rows: 1)
  • is using an index (Extra: Using index)

EXPLAIN With Simple Project and No Keys

Issuing the same EXPLAIN against the 'bad' dcris database:

      
/* dcris_bad */
EXPLAIN
SELECT * FROM broad WHERE broadDesc = 'Military History'

    

returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE broad ALL 19 10.00 Using where

which is neither better nor worse than the previous statement on the 'bad' dcris database, and clearly less efficient than the indexed version.

Using LIKE

EXPLAIN With Simple LIKE

This SELECT uses a string with a trailing wildcard character '%' in its LIKE clause:

      
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadDesc LIKE 'Military History%'

    

returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE broad index inBroadDesc inBroadDesc 262 1 100.0 Using where; Using index

A filtered result of 100% tells us that the selection made use of the index on the description attribute.

EXPLAIN With LIKE and Prefixed '%'

This SELECT uses a string with a both a beginning and trailing wildcard character '%' in its LIKE clause:

      
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadDesc LIKE '%Military History%'

    

returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE broad range inBroadDesc 262 19 11.11 Using index

Clearly looking for values with the equivalent of the Python startswith() is more efficient that looking for values within the body of a string. type: range indicates that only rows whose values are within a certain range are chosen, and filtered: 11.11 tells us the the index was not much help. This would also apply to comparisons that use the <>, >, >=, <, <=, IS NULL, <=>, BETWEEN, or IN() operators.

(What would the results look like for '%Military History'? Try it.)

EXPLAIN With Simple LIKE and No Keys

Testing either of these previous SELECTs against the 'bad' dcris database returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE broad ALL 19 11.11 Using where

Clearly the use and placement of the '%' wildcard character doesn't matter if no indexes are defined.

Joins

EXPLAIN With Key Join

Issuing the EXPLAIN:

      
/* dcris */
EXPLAIN
SELECT * FROM member
INNER JOIN memberBroad ON (memberId = memberBroadMemberId)
INNER JOIN broad ON (broadId = memberBroadBroadId)
WHERE broadId = 12

    

returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE broad const PRIMARY PRIMARY 1 const 1 100.00
1 SIMPLE memberBroad ref PRIMARY,fkMemberBroadBroad fkMemberBroadBroad 1 const 21 100.00 Using index
1 SIMPLE member eq_ref PRIMARY PRIMARY 2 memberBroadMemberId 1 100.00

telling us that:

  • the joins use the foreign key definitions between member, memberBroad, and broad
  • a total of 21 records are examined and returned (rows= 1 × 21 × 1)
  • One row is read from the member table for each combination of rows from the other tables (type: eq_ref)

EXPLAIN With Non-Key Join

Executing the same EXPLAIN against the 'bad' dcris database returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE broad ALL 19 10.00 Using where
1 SIMPLE memberBroad ALL 307 10.00 Using where; Using join buffer (hash join)
1 SIMPLE member ALL 82 10.00 Using where; Using join buffer (hash join)

telling us that the query has to process 478,306 records to extract the required data (rows = 19 × 307 × 82).

EXPLAIN With INNER Key Join

The EXPLAIN:

      
/* dcris */
EXPLAIN
SELECT memberSurname, memberForename, COUNT(pubId) AS pubCount
FROM member INNER JOIN pub ON (memberId = pubMemberId)
GROUP BY memberSurname, memberForename
ORDER BY memberSurname, memberForename

    

returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE member index inFullName inFullName 205 82 100.00 Using temporary; Using filesort
1 SIMPLE pub ref inPubMember inPubMember 2 dcris.member.memberId 2 100.00 Using index

telling us that the query uses the foreign key definitions between member and pub to perform the join, and 164 records (rows = 82 × 2) rows are examined. Note that the rows count doesn't refer to the actual number of rows returned (which in this case would be 79, as three members have no publications), but rather an estimate of the number of records to be processed.

EXPLAIN With Non-Key Join

Issuing the same EXPLAIN against the 'bad' dcris database returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE member ALL 82 100.00 Using temporary; Using filesort
1 SIMPLE pub ALL 229 100.00 Using where; Using join buffer (hash join)

telling us that the query examines 18,788 records (rows = 82 × 229).

EXPLAIN With LEFT OUTER Key Join
      
/* dcris */
EXPLAIN
SELECT memberSurname, memberForename, COUNT(pubId) AS pubCount
FROM member LEFT OUTER JOIN pub ON (memberId = pubMemberId)
GROUP BY memberSurname, memberForename
ORDER BY memberSurname, memberForename

    

which returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE member index inFullName inFullName 205 82 100.00 Using temporary; Using filesort
1 SIMPLE pub ref inPubMember inPubMember 2 member.memberId 2 100.00 Using index

telling us that the use of INNER versus OUTER makes no difference in these particular queries.

Issuing the same statements against the 'bad' dcris database also makes no difference in the EXPLAIN results.

Subqueries

EXPLAIN With Subquery
      
/* dcris */
EXPLAIN
SELECT memberSurname, memberForename, (SELECT COUNT(*) FROM pub
WHERE memberId = pubMemberId) AS pubCount
FROM member
ORDER BY memberSurname, memberForename

    

returns:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY member index inFullName 205 82 100.00 Using index
2 DEPENDENT SUBQUERY pub ref inPubMember inPubMember 2 memberId 2 100.00 Using index

tells us that the query is using a subquery (select_type: DEPENDENT SUBQUERY), but the query is still making use of the foreign key relationship between member and pub.

EXPLAIN With Subquery and No Key
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY m ALL 82 100.00 Using filesort
2 DEPENDENT SUBQUERY p ALL 229 10.00 Using where

tells us that the subquery is recognized, but no efficiencies are derived from that.

The overall lesson is obvious: a well designed database with appropriate primary, unique, and foreign keys is significantly more efficient that a database without.