复盘MySQL分页查询优化方案

2020 年 3 月 10 日
筆記

一、前言

MySQL分页查询作为Java面试的一道高频面试题，这里有必要实践一下，毕竟实践出真知。
很多同学在做测试时苦于没有海量数据，官方其实是有一套测试库的。

二、模拟数据

这里模拟数据分2种情况导入，如果只是需要数据测试下，那么推荐官方数据。如果官方数据满足不了需求的话，那么我们自己模拟数据。

1. 导入官方测试库

下载官方数据库文件或者在 github 上下载。

该测试库含有6个表。

首先进入 employees_db, 执行导入数据指令

mysql -uroot -proot -t < employees.sql

有些环境可能会报错

ERROR 1193 (HY000) at line 38: Unknown system variable 'storage_engine'

连接mysql查看默认引擎，发现不是本地环境的问题。

mysql> show variables like '%engine%';  +----------------------------------+--------+  | Variable_name                    | Value  |  +----------------------------------+--------+  | default_storage_engine           | InnoDB |  | default_tmp_storage_engine       | InnoDB |  | disabled_storage_engines         |        |  | internal_tmp_disk_storage_engine | InnoDB |  +----------------------------------+--------+  4 rows in set (0.01 sec)

修改 employees.sql 脚本

   set default_storage_engine = InnoDB;  -- set storage_engine = MyISAM;  -- set storage_engine = Falcon;  -- set storage_engine = PBXT;  -- set storage_engine = Maria;    select CONCAT('storage engine: ', @@default_storage_engine) as INFO;

再次执行发现导入成功

➜  employees_db mysql -uroot -proot -t < employees.sql  mysql: [Warning] Using a password on the command line interface can be insecure.  +-----------------------------+  | INFO                        |  +-----------------------------+  | CREATING DATABASE STRUCTURE |  +-----------------------------+  +------------------------+  | INFO                   |  +------------------------+  | storage engine: InnoDB |  +------------------------+  +---------------------+  | INFO                |  +---------------------+  | LOADING departments |  +---------------------+  +-------------------+  | INFO              |  +-------------------+  | LOADING employees |  +-------------------+  +------------------+  | INFO             |  +------------------+  | LOADING dept_emp |  +------------------+  +----------------------+  | INFO                 |  +----------------------+  | LOADING dept_manager |  +----------------------+  +----------------+  | INFO           |  +----------------+  | LOADING titles |  +----------------+  +------------------+  | INFO             |  +------------------+  | LOADING salaries |  +------------------+

验证结果(配置修改同上)

➜  employees_db mysql -uroot -proot -t < test_employees_sha.sql  mysql: [Warning] Using a password on the command line interface can be insecure.  +----------------------+  | INFO                 |  +----------------------+  | TESTING INSTALLATION |  +----------------------+  +--------------+------------------+------------------------------------------+  | table_name   | expected_records | expected_crc                             |  +--------------+------------------+------------------------------------------+  | departments  |                9 | 4b315afa0e35ca6649df897b958345bcb3d2b764 |  | dept_emp     |           331603 | d95ab9fe07df0865f592574b3b33b9c741d9fd1b |  | dept_manager |               24 | 9687a7d6f93ca8847388a42a6d8d93982a841c6c |  | employees    |           300024 | 4d4aa689914d8fd41db7e45c2168e7dcb9697359 |  | salaries     |          2844047 | b5a1785c27d75e33a4173aaa22ccf41ebd7d4a9f |  | titles       |           443308 | d12d5f746b88f07e69b9e36675b6067abb01b60e |  +--------------+------------------+------------------------------------------+

我们可以看到emp大概有33万条数据。

2. 存储过程导入模拟数据

这里我们可以选择存储过程批量导入。

首先创建一张表

drop table if exists `user`;  create table `user`(    `id` int unsigned auto_increment,    `username` varchar(64) not null default '',    `score` int(11) not null default 0,      primary key(`id`)  )ENGINE = InnoDB;

创建存储过程

DROP PROCEDURE IF EXISTS batchInsert;  delimiter $$  -- 声明存储过程结束符号  create procedure batchInsert() -- 创建存储过程  begin   -- 存储过程主体开始      declare num int; -- 声明变量      set num=1; -- 初始值      while num<=3000000 do -- 循环条件          insert into user(`username`,`score`) values(concat('user-', num),num); -- 执行语句          set num=num+1; -- 循环变量自增      end while; -- 结束循环  end$$ -- 存储过程主体结束  delimiter ; #恢复;表示结束    CALL batchInsert; -- 执行存储过程

可以看到测试300W条数据大概1046s插入完成。好吧，本来计划导入1000w的结果时间太长了。

三、常用的MySQL分页查询问题复现及优化。

我们拿现有的表 user 进行测试，该表有 300w 条数据。

1. 前置检查

首先查看下该表结构以及目前存在哪些索引

mysql> desc user;  +----------+------------------+------+-----+---------+----------------+  | Field    | Type             | Null | Key | Default | Extra          |  +----------+------------------+------+-----+---------+----------------+  | id       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |  | username | varchar(30)      | NO   |     |         |                |  | score    | int(11)          | NO   |     | 0       |                |  +----------+------------------+------+-----+---------+----------------+  3 rows in set (0.00 sec)    mysql> show index from user;  +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+  | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |  +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+  | user  |          0 | PRIMARY  |            1 | id          | A         |     2991886 |     NULL | NULL   |      | BTREE      |         |               |  +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+  1 row in set (0.00 sec)

可以看到只有 id 主键索引。

其次查看是否开启 缓存 (避免查询缓存对执行效率产生影响)

mysql> show variables like '%query_cache%';  +------------------------------+---------+  | Variable_name                | Value   |  +------------------------------+---------+  | have_query_cache             | YES     |  | query_cache_limit            | 1048576 |  | query_cache_min_res_unit     | 4096    |  | query_cache_size             | 1048576 |  | query_cache_type             | OFF     |  | query_cache_wlock_invalidate | OFF     |  +------------------------------+---------+  6 rows in set (0.00 sec)    mysql> show profiles;  Empty set, 1 warning (0.00 sec)

have_query_cache 和 query_cache_type 说明支持缓存但并未开启。
show profiles 显示为空，说明profiles功能是关闭的。

开启 profiles

mysql> SET profiling = 1;  Query OK, 0 rows affected, 1 warning (0.00 sec)    mysql> show profiles;  +----------+------------+-------------------+  | Query_ID | Duration   | Query             |  +----------+------------+-------------------+  |        1 | 0.00012300 | SET profiling = 1 |  +----------+------------+-------------------+  1 row in set, 1 warning (0.00 sec)

2. 无索引分页查询

一般我们最常用的分页查询的方式为 order by + limit m,n 的方式, 现在我们测试下分页性能

select * from user order by score limit 0,10; -- 10 rows in set (0.65 sec)  select * from user order by score limit 10000,10; -- 10 rows in set (0.83 sec)  select * from user order by score limit 100000,10; -- 10 rows in set (1.03 sec)  select * from user order by score limit 1000000,10; -- 10 rows in set (1.14 sec)

这里我们确认下是否用到了索引

mysql> explain select * from user order by score limit 1000000,10;  +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------+  | id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra          |  +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------+  |  1 | SIMPLE      | user  | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 2991995 |   100.00 | Using filesort |  +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------+  1 row in set, 1 warning (0.00 sec)

可以看到确实没有用到索引，全表扫描100W数据分页大概需要1.14s的时间。

3. 有索引分页查询

select * from user order by id limit 10000,10; -- 10 rows in set (0.01 sec)  select * from user order by id limit 1000000,10; -- 10 rows in set (0.18 sec)  select * from user order by id limit 2000000,10; -- 10 rows in set (0.35 sec)

该查询用到了主键索引，所以查询效率比较高。
可以看到，当数据量变大时，查询效率明显下降。

这里我们确认下是否使用到了索引

mysql> explain select * from user order by id limit 2000000,10;  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------+  | id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra |  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------+  |  1 | SIMPLE      | user  | NULL       | index | NULL          | PRIMARY | 4       | NULL | 2000010 |   100.00 | NULL  |  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------+  1 row in set, 1 warning (0.00 sec)

可以看到用了全索引扫描，共查询了2000010行数据。

4. 优化

我们根据MYSQL自带的一种query诊断分析工具查看下sql语句执行各个操作的耗时详情。可以看到查询获取到的2000010条记录都返回给客户端了，耗时主要集中在Sending data阶段。但是客户端只需要10条数据，我们能否只给客户端返回10条数据呢？

mysql> show profiles;  +----------+------------+---------------------------------------------------------+  | Query_ID | Duration   | Query                                                   |  +----------+------------+---------------------------------------------------------+  |        1 | 0.00012300 | SET profiling = 1                                       |  |        2 | 0.00009200 | SET profiling = 1                                       |  |        3 | 0.35689500 | select * from user order by id limit 2000000,10         |  |        4 | 0.00023900 | explain select * from user order by id limit 2000000,10 |  +----------+------------+---------------------------------------------------------+  4 rows in set, 1 warning (0.00 sec)    mysql> show profile for query 3;  +----------------------+----------+  | Status               | Duration |  +----------------------+----------+  | starting             | 0.000071 |  | checking permissions | 0.000007 |  | Opening tables       | 0.000012 |  | init                 | 0.000017 |  | System lock          | 0.000008 |  | optimizing           | 0.000005 |  | statistics           | 0.000024 |  | preparing            | 0.000016 |  | Sorting result       | 0.000004 |  | executing            | 0.000003 |  | Sending data         | 0.356653 |  | end                  | 0.000013 |  | query end            | 0.000005 |  | closing tables       | 0.000008 |  | freeing items        | 0.000019 |  | cleaning up          | 0.000030 |  +----------------------+----------+  16 rows in set, 1 warning (0.00 sec)

网上的优化方案: 子查询 + 覆盖索引

mysql> select * from user where id > (select id from user order by id limit 2000000, 1) limit 10;  +---------+--------------+---------+  | id      | username     | score   |  +---------+--------------+---------+  | 2000002 | user-2000002 | 2000002 |  | 2000003 | user-2000003 | 2000003 |  | 2000004 | user-2000004 | 2000004 |  | 2000005 | user-2000005 | 2000005 |  | 2000006 | user-2000006 | 2000006 |  | 2000007 | user-2000007 | 2000007 |  | 2000008 | user-2000008 | 2000008 |  | 2000009 | user-2000009 | 2000009 |  | 2000010 | user-2000010 | 2000010 |  | 2000011 | user-2000011 | 2000011 |  +---------+--------------+---------+  10 rows in set (0.29 sec)    mysql> explain select * from user where id > (select id from user order by id limit 2000000, 1) limit 10;  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+  | id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+  |  1 | PRIMARY     | user  | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL | 1495997 |   100.00 | Using where |  |  2 | SUBQUERY    | user  | NULL       | index | NULL          | PRIMARY | 4       | NULL | 2000001 |   100.00 | Using index |  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+  2 rows in set, 1 warning (0.30 sec)

然而并没有提升查询性能。没看到问题出在哪里呢？从执行计划可以看出，索引和我们期望是一致的。rows这里检索了很多行。单独看下子查询

mysql> select id from user order by id limit 2000000, 1;  +---------+  | id      |  +---------+  | 2000001 |  +---------+  1 row in set (0.29 sec)    mysql> explain select id from user order by id limit 2000000, 1;  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+  | id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+  |  1 | SIMPLE      | user  | NULL       | index | NULL          | PRIMARY | 4       | NULL | 2000001 |   100.00 | Using index |  +----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+  1 row in set, 1 warning (0.00 sec)

这里可以看出子查询即使走了覆盖索引，依旧消耗3s左右，我觉得这就是正常的索引IO花费的时间。没找到官方测试数据做对比，以及MySQL一次IO查询花费的时间来做对比。

理论上int主键一页可以存储1000个键,根常驻内存,那么B+Tree第二层大概100W个键,测试数据在200W的分页，理论上需要2次IO可以找到数据。2次IO花费的时间是3s的话，1次应该在1.5s左右, 我们查询下99W左右的分页看是否符合假想。

mysql> select id from user order by id limit 990000,1;  +--------+  | id     |  +--------+  | 990001 |  +--------+  1 row in set (0.15 sec)

所以这里笔者大胆的猜想结果是正常开销

四、最后

本来想复盘网上的分页优化方案是否可靠，但是预期结果还是有区别。希望聪明的读者有不同见解的不吝赐教。公众号里有笔者的微信二维码。

复盘MySQL分页查询优化方案

一、前言

二、模拟数据

1. 导入官方测试库

2. 存储过程导入模拟数据

三、常用的MySQL分页查询问题复现及优化。

1. 前置检查

2. 无索引分页查询

3. 有索引分页查询

4. 优化

四、最后

VirMach 便宜 VPS

QNews

复盘MySQL分页查询优化方案

一、前言

二、模拟数据

1. 导入官方测试库

2. 存储过程导入模拟数据

三、常用的MySQL分页查询问题复现及优化。

1. 前置检查

2. 无索引分页查询

3. 有索引分页查询

4. 优化

四、最后

分享此文：

Related Posts

《痞子衡嵌入式半月刊》 第 23 期

小程序IntersectionObserver用法

使用PHP语言制作具有加减乘除取余功能的简单计算器

研究开源源码之Myrmec

VirMach 便宜 VPS

QNews

熱門搜尋

《痞子衡嵌入式半月刊》第 23 期