Steve Meyers has 15 experience as a MySQL DBA, web software architect, and system administrator, with an emphasis on scalability

How to identify an unnecessary index

+ No comment yet
Let's look at the index from the wp_posts table in a standard WordPress installation.

SHOW KEYS FROM wp_posts;
+----------+------------+------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name  | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| wp_posts |          0 | PRIMARY          |            1 | ID           | A         |        1772 |     NULL | NULL   |      | BTREE      |         |               |
| wp_posts |          1 | post_name        |            1 | post_name    | A         |        1772 |     NULL | NULL   |      | BTREE      |         |               |
| wp_posts |          1 | type_status_date |            1 | post_type    | A         |           6 |     NULL | NULL   |      | BTREE      |         |               |
| wp_posts |          1 | type_status_date |            2 | post_status  | A         |           9 |     NULL | NULL   |      | BTREE      |         |               |
| wp_posts |          1 | type_status_date |            3 | post_date    | A         |        1772 |     NULL | NULL   |      | BTREE      |         |               |
| wp_posts |          1 | type_status_date |            4 | ID           | A         |        1772 |     NULL | NULL   |      | BTREE      |         |               |
| wp_posts |          1 | post_parent      |            1 | post_parent  | A         |         147 |     NULL | NULL   |      | BTREE      |         |               |
| wp_posts |          1 | post_author      |            1 | post_author  | A         |           2 |     NULL | NULL   |      | BTREE      |         |               |
| wp_posts |          1 | yarpp_title      |            1 | post_title   | NULL      |         590 |     NULL | NULL   |      | FULLTEXT   |         |               |
| wp_posts |          1 | yarpp_content    |            1 | post_content | NULL      |        1772 |     NULL | NULL   |      | FULLTEXT   |         |               |
+----------+------------+------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
10 rows in set (0.00 sec)

How can we tell which indexes are good, and which are bad?

To begin with, let's look at the cardinality.  Cardinality is a measure of how many unique values there are for that column.  A column with very low cardinality is not very useful as an index, since it does not narrow down queries very much.

The post_type and post_status columns have a fairly low cardinality, so we'd ordinarily be concerned about them, but they're part of a multi-column index.  Since there are four columns to the index, their own low cardinality will probably be masked by the other columns.  As always, you know your own data better than I do, so you should know whether that applies in your circumstances.  What we're looking at is the cardinality of the combination of columns, so if the values of the columns are related to each other, you could still end up with low cardinality.

On the other hand, post_author has very low cardinality, and is a single-column index.  This column is potentially useful for blogs with many authors.  For most blogs, it ends up being more of a hindrance than a help, as it negatively affects write performance.

Going back to the type_status_date index, we noticed that the first two columns have low cardinality, but you'll notice that the third column's cardinality is equal to the primary key's cardinality.  This means that once we use the third column of the key (having necessarily already used the first two columns), we have narrowed it down to a single row.  This means that the fourth column in the index probably does not add any useful information.

We'll demonstrate this with a few queries.


EXPLAIN SELECT * FROM wp_posts WHERE post_type IN ('post', 'page') AND post_status='publish' AND post_date > '2013-01-01 00:00:00' ORDER BY ID;
+------+-------------+----------+-------+------------------+------------------+---------+------+------+---------------------------------------+
| id   | select_type | table    | type  | possible_keys    | key              | key_len | ref  | rows | Extra                                 |
+------+-------------+----------+-------+------------------+------------------+---------+------+------+---------------------------------------+
|    1 | SIMPLE      | wp_posts | range | type_status_date | type_status_date | 132     | NULL |   34 | Using index condition; Using filesort |
+------+-------------+----------+-------+------------------+------------------+---------+------+------+---------------------------------------+
1 row in set (0.01 sec)



EXPLAIN SELECT * FROM wp_posts WHERE post_type='post' AND post_status='publish' AND post_date > '2013-01-01 00:00:00' ORDER BY ID ;
+------+-------------+----------+-------+------------------+------------------+---------+------+------+---------------------------------------+
| id   | select_type | table    | type  | possible_keys    | key              | key_len | ref  | rows | Extra                                 |
+------+-------------+----------+-------+------------------+------------------+---------+------+------+---------------------------------------+
|    1 | SIMPLE      | wp_posts | range | type_status_date | type_status_date | 132     | NULL |   33 | Using index condition; Using filesort |
+------+-------------+----------+-------+------------------+------------------+---------+------+------+---------------------------------------+
1 row in set (0.00 sec)

We're specifying the post_type, post_status, and post_date in both queries.  You'll notice that the "rows" doesn't change much when we just select posts instead of posts and pages.  Even though the cardinality of post_type is a whopping 6, even that is a bit misleading, as almost all of the rows have type "post", making the real-world cardinality even lower than that.

If we specify an exact date and time, rather than a range, then we narrow it down to a single row.

EXPLAIN SELECT * FROM wp_posts WHERE post_type='post' AND post_status='publish' AND post_date = '2013-02-05 10:53:57' ORDER BY ID ;
+------+-------------+----------+------+------------------+------------------+---------+-------------------+------+------------------------------------+
| id   | select_type | table    | type | possible_keys    | key              | key_len | ref               | rows | Extra                              |
+------+-------------+----------+------+------------------+------------------+---------+-------------------+------+------------------------------------+
|    1 | SIMPLE      | wp_posts | ref  | type_status_date | type_status_date | 132     | const,const,const |    1 | Using index condition; Using where |
+------+-------------+----------+------+------------------+------------------+---------+-------------------+------+------------------------------------+
1 row in set (0.00 sec)

How often are we going to query the database for an exact date and time, though?  Generally, we'll either be specifying a date/time range, or ordering by the date/time.  In either case, we cannot use the portion of the index after post_date.

So does that mean the ID portion of the key is useless?  In all likelihood, yes.  I can't imagine any scenario where it would be useful.

As an aside, when using the InnoDB storage engine, secondary indexes merely point back to the primary key, so the ID would already be referenced in the index.

Interesting MySQL optimizer case

+ No comment yet
Jaime at the MySQL Performance Blog had an interesting post regarding some unexpected behavior from the MySQL optimizer.  Although this particular case probably doesn't affect most people, it does give some insight into how the optimizer works, and how subtle changes to a query can change performance.

Asynchronous MySQL queries in PHP

+ No comment yet
I was made aware today by a post on the MySQL Performance Blog that the mysqlnd driver for PHP has support for asynchronous PHP queries.  At the Midwest PHP Conference, I gave a talk on database optimization for web developers, and in a follow-up discussion with an attendee, I apparently misinformed him.

Midwest PHP Presentation Notes

+ No comment yet
Thanks for attending my presentation at the Midwest PHP conference!

Slides as PDF
Leave feedback on joind.in

Useful links: