Introduction

Most DBMSs have tools that can help you determine the efficiency of your data design and queries. MySQL, for example, provides the EXPLAIN command to help you understand how your queries are executed. Using EXPLAIN is very simple, just put it in front of any query. It is automatically used in the SQL Query Practice page.

The following example shows the results of applying an EXPLAIN to a SQL statement:

Using EXPLAIN

      
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadId = 12

which returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	broad		const	PRIMARY	PRIMARY	1	const	1	100.00

The result consist of:

id: The SELECT identifier. Main queries are always numbered 1, the first subquery 2, and so on.
select_type: The SELECT type.
table: The table queried.
partitions: The matching partitions
type: The type of join used
possible_keys: The indexes to choose from
key: The index actually used
key_len: Length of the key chosen
ref: The indexed columns
rows: Estimate of the number of rows to be examined
filtered: Percentage of result rows filtered by the condition. 100% is best.
Extra: Additional information

(For complete details of these fields, see: 8.8.2 EXPLAIN Output Format.)

These results tells us that the query:

did not use any unions or subqueries (select_type: SIMPLE)
join (although there was no actual join) produced only one row (type: const)
used the table's primary key (key: PRIMARY)
is estimated to have to search through only one row to find the requested data (rows: 1)
100% of the query was filtered

(The other items can be ignored for now.)

Thus we can see that querying the broad table for a particular ID number is very efficient due to the table's primary key.

EXPLAIN With No Keys

Imagine now that we have a database named dcris_bad that has the same tables as dcris, but without primary keys, foreign keys, or indices. The following EXPLAIN:

      
/* dcris_bad */
EXPLAIN
SELECT * FROM broad WHERE broadId = 12

returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	broad		ALL					19	10.00	Using where

This tells us that the query:

did not use any unions or subqueries (select_type: SIMPLE)
join forces a scan of the entire table (type: ALL)
does not use any keys (key is empty)
is estimated to have to search through 19 rows to find the requested data (rows: 19)
Only 10% of the rows were filtered

Clearly, the use of the primary key in the dcris query is significantly more efficient than a keyless query.

EXPLAIN On a Simple Projection

The EXPLAIN:

      
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadDesc = 'Military History'

returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	broad		const	inBroadDesc	inBroadDesc	262	const	1	100.00	Using index

This tells us that the query:

did not use any unions or subqueries (select_type: SIMPLE)
join (although there was no actual join) produced only one row (type: const)
used the table's unique key on the broadDesc column (key: index_broad_desc)
is estimated to have to search through only one row to find the requested data (rows: 1)
is using an index (Extra: Using index)

EXPLAIN With Simple Project and No Keys

Issuing the same EXPLAIN against the 'bad' dcris database:

      
/* dcris_bad */
EXPLAIN
SELECT * FROM broad WHERE broadDesc = 'Military History'

returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	broad		ALL					19	10.00	Using where

which is neither better nor worse than the previous statement on the 'bad' dcris database, and clearly less efficient than the indexed version.

Using `LIKE`

EXPLAIN With Simple LIKE

This SELECT uses a string with a trailing wildcard character '%' in its LIKE clause:

      
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadDesc LIKE 'Military History%'

returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	broad		index	inBroadDesc	inBroadDesc	262		1	100.0	Using where; Using index

A filtered result of 100% tells us that the selection made use of the index on the description attribute.

EXPLAIN With LIKE and Prefixed '%'

This SELECT uses a string with a both a beginning and trailing wildcard character '%' in its LIKE clause:

      
/* dcris */
EXPLAIN
SELECT * FROM broad WHERE broadDesc LIKE '%Military History%'

returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	broad		range		inBroadDesc	262		19	11.11	Using index

Clearly looking for values with the equivalent of the Python startswith() is more efficient that looking for values within the body of a string. type: range indicates that only rows whose values are within a certain range are chosen, and filtered: 11.11 tells us the the index was not much help. This would also apply to comparisons that use the <>, >, >=, <, <=, IS NULL, <=>, BETWEEN, or IN() operators.

(What would the results look like for '%Military History'? Try it.)

EXPLAIN With Simple LIKE and No Keys

Testing either of these previous SELECTs against the 'bad' dcris database returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	broad		ALL					19	11.11	Using where

Clearly the use and placement of the '%' wildcard character doesn't matter if no indexes are defined.

Joins

EXPLAIN With Key Join

Issuing the EXPLAIN:

      
/* dcris */
EXPLAIN
SELECT * FROM member
INNER JOIN memberBroad ON (memberId = memberBroadMemberId)
INNER JOIN broad ON (broadId = memberBroadBroadId)
WHERE broadId = 12

returns:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	broad	const	PRIMARY	PRIMARY	1	const	1	100.00
1	SIMPLE	memberBroad	ref	PRIMARY,fkMemberBroadBroad	fkMemberBroadBroad	1	const	21	100.00	Using index
1	SIMPLE	member	eq_ref	PRIMARY	PRIMARY	2	memberBroadMemberId	1	100.00

telling us that:

the joins use the foreign key definitions between member, memberBroad, and broad
a total of 21 records are examined and returned (rows= 1 × 21 × 1)
One row is read from the member table for each combination of rows from the other tables (type: eq_ref)

EXPLAIN With Non-Key Join

Executing the same EXPLAIN against the 'bad' dcris database returns:

id	select_type	table	type	rows	filtered	Extra
1	SIMPLE	broad	ALL	19	10.00	Using where
1	SIMPLE	memberBroad	ALL	307	10.00	Using where; Using join buffer (hash join)
1	SIMPLE	member	ALL	82	10.00	Using where; Using join buffer (hash join)

telling us that the query has to process 478,306 records to extract the required data (rows = 19 × 307 × 82).

EXPLAIN With INNER Key Join

The EXPLAIN:

      
/* dcris */
EXPLAIN
SELECT memberSurname, memberForename, COUNT(pubId) AS pubCount
FROM member INNER JOIN pub ON (memberId = pubMemberId)
GROUP BY memberSurname, memberForename
ORDER BY memberSurname, memberForename

returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	member		index	inFullName	inFullName	205		82	100.00	Using temporary; Using filesort
1	SIMPLE	pub		ref	inPubMember	inPubMember	2	dcris.member.memberId	2	100.00	Using index

telling us that the query uses the foreign key definitions between member and pub to perform the join, and 164 records (rows = 82 × 2) rows are examined. Note that the rows count doesn't refer to the actual number of rows returned (which in this case would be 79, as three members have no publications), but rather an estimate of the number of records to be processed.

EXPLAIN With Non-Key Join

Issuing the same EXPLAIN against the 'bad' dcris database returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	member		ALL					82	100.00	Using temporary; Using filesort
1	SIMPLE	pub		ALL					229	100.00	Using where; Using join buffer (hash join)

telling us that the query examines 18,788 records (rows = 82 × 229).

EXPLAIN With LEFT OUTER Key Join

      
/* dcris */
EXPLAIN
SELECT memberSurname, memberForename, COUNT(pubId) AS pubCount
FROM member LEFT OUTER JOIN pub ON (memberId = pubMemberId)
GROUP BY memberSurname, memberForename
ORDER BY memberSurname, memberForename

which returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	member		index	inFullName	inFullName	205		82	100.00	Using temporary; Using filesort
1	SIMPLE	pub		ref	inPubMember	inPubMember	2	member.memberId	2	100.00	Using index

telling us that the use of INNER versus OUTER makes no difference in these particular queries.

Issuing the same statements against the 'bad' dcris database also makes no difference in the EXPLAIN results.

Subqueries

EXPLAIN With Subquery

      
/* dcris */
EXPLAIN
SELECT memberSurname, memberForename, (SELECT COUNT(*) FROM pub
WHERE memberId = pubMemberId) AS pubCount
FROM member
ORDER BY memberSurname, memberForename

returns:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	member		index		inFullName	205		82	100.00	Using index
2	DEPENDENT SUBQUERY	pub		ref	inPubMember	inPubMember	2	memberId	2	100.00	Using index

tells us that the query is using a subquery (select_type: DEPENDENT SUBQUERY), but the query is still making use of the foreign key relationship between member and pub.

EXPLAIN With Subquery and No Key

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	m		ALL					82	100.00	Using filesort
2	DEPENDENT SUBQUERY	p		ALL					229	10.00	Using where

tells us that the subquery is recognized, but no efficiencies are derived from that.

The overall lesson is obvious: a well designed database with appropriate primary, unique, and foreign keys is significantly more efficient that a database without.

Lesson 15: Query Optimization

Introduction

Using LIKE

Joins

Subqueries

Using `LIKE`