Workaround on InnoDB "SELECT COUNT(*)" Problem

MySQL Tutorials - Herong's Tutorial Examples

∟InnoDB Storage Engine - Primary and Secondary Indexes

∟Workaround on InnoDB "SELECT COUNT(*)" Problem

This section provides a tutorial example showing workarounds of the 'SELECT COUNT(*)' performance problem. count(fid) is much faster, if 'fid' is defined as a secondary index.

After learning how secondary index works on InnoDB tables, now we can look at some workarounds on the "SELECT COUNT(*)" performance problem.

1. Review 'Body' pages cached in the buffer.

mysql> select index_name, data_size/number_records as record_size,
    -> count(*) as pages, sum(number_records) as records
    -> from information_schema.innodb_buffer_page
    -> where table_name rlike '`Body`'
    -> group by index_name, data_size/number_records
    -> order by index_name;

  | index_name | record_size | pages | records |
  +------------+-------------+-------+---------+
  | PRIMARY    |     14.0000 |     1 |     801 |
  | PRIMARY    |   1812.0000 |   801 |    6400 |
  | SECOND     |     14.0000 |     6 |    6400 |
  | SECOND     |     18.0000 |     1 |       6 |
  | THIRD      |     14.0000 |     6 |    6400 |
  | THIRD      |     18.0000 |     2 |      12 |

2. Run "SELECT COUNT(*)" with the primary index.

mysql> select count(*) from Body;
  |     6400 |

mysql> call GetReads();
  | @read | @last    |
  +-------+----------+
  |  2405 | 71090813 |

mysql> select count(*) from Body use index (PRIMARY);
  |     6400 |

mysql> call GetReads();
  | @read | @last    |
  +-------+----------+
  |  2405 | 71059434 |

As expected, InnoDB engine performs "SELECT COUNT(*)" statement on an InnoDB table with a full table scan, reading all pages from the table. It actually reads all pages 3 times: 2405 > 3*801!

3. Compare "SELECT COUNT(*)" performance with no index, and different indexes. I see no differences.

mysql> select count(*) from Body USE INDEX ();
  @read = 2405

mysql> select count(*) from Body use index (PRIMARY);
  @read = 2405

mysql> select count(*) from Body use index (SECOND);
  @read = 2405

mysql> select count(*) from Body use index (THIRD);
  @read = 2405

mysql> select count(*) from Body;
  @read = 2405

mysql> explain select count(*) from Body;
+-------------+-------+---------------+-------+----------+-------------+
| select_type | type  | possible_keys | key   | filtered | Extra       |
+-------------+-------+---------------+-------+----------+-------------+
| SIMPLE      | index | NULL          | THIRD |   100.00 | Using index |
+-------------+-------+---------------+-------+----------+-------------+

4. Compare "SELECT COUNT(*)", "SELECT COUNT(fid)" and "SELECT COUNT(fid)" statements.

mysql> select count(*) from Body;
  @read = 2405

mysql> select count(id) from Body;
  @read = 2405

mysql> select count(id) from Body use index ();
  @read = 2405

mysql> select count(fid) from Body;
  @read = 72

mysql> select count(fid) from Body use index (SECOND);
  @read = 72

mysql> select count(fid) from Body use index ();
  @read = 867

What a surprise! InnoDB is able take advantages of a secondary index, if we count unique values of "fid" column, which is equivalent to the total number of rows.

In the above test, "SELECT COUNT(fid)" took only 72-page reading, comparing 2405-page reading for "SELECT COUNT(*)". That is a 97% reduction of execution time.

In theory, InnoDB could perform "SELECT COUNT(fid)" on the secondary index only, without using the primary index at all. It could do a full table scan of all 6 data pages of the secondary index and get the job done.

5. Add a secondary index as a shadow of the primary index.

mysql> create UNIQUE index SHADOW on Body (id);

mysql> show index from Body;
  | Non_unique | Key_name | Column_name | Null | Index_type |
  +------------+----------+-------------+------+------------+
  |          0 | PRIMARY  | id          |      | BTREE      |
  |          0 | SHADOW   | id          |      | BTREE      |
  |          1 | SECOND   | fid         | YES  | BTREE      |
  |          0 | THIRD    | fid         | YES  | BTREE      |

mysql> select index_name, data_size/number_records as record_size,
    -> count(*) as pages, sum(number_records) as records
    -> from information_schema.innodb_buffer_page
    -> where table_name rlike '`Body`'
    -> group by index_name, data_size/number_records
    -> order by index_name;

  | index_name | record_size | pages | records |
  +------------+-------------+-------+---------+
  | PRIMARY    |     14.0000 |     1 |     801 |
  | PRIMARY    |   1812.0000 |   801 |    6400 |

  | SHADOW     |      9.0000 |     4 |    6400 |
  | SHADOW     |     13.0000 |     2 |       8 |

  | SECOND     |     14.0000 |     6 |    6400 |
  | SECOND     |     18.0000 |     1 |       6 |

  | THIRD      |     14.0000 |     6 |    6400 |
  | THIRD      |     18.0000 |     2 |      12 |

5. Run "SELECT COUNT(id)" statement with SHADOW index. I see no improvement.

mysql> select count(id) from Body use index (SHADOW);
  @read = 2405

mysql> select count(id) from Body use index (PRIMARY);
  @read = 2405

What we have learned so far:

"COUNT(*)" on an InnoDB table is very flow. It will read all data pages 3 times.
"COUNT(id)" on an InnoDB table is the same as "COUNT(*)", if "id" is the primary key.
"COUNT(fid)" on an InnoDB table is much faster than "COUNT(*)", if "fid" is defined as a secondary index.