使用pg_repack实现在线vacuum
- 2019 年 12 月 18 日
- 笔记
Postgresql通过数据多版本实现mvcc,删除数据并不会真正删除数据,而是修改标识,更新是通过删除+插入的方式进行,所以在频繁更新的系统,数据膨胀是个头疼的问题,如果不进行处理,数据膨胀倍数可能达到十几倍。
为了处理膨胀问题,pg提供了vacuum工具,vacuum分为普通vacuum和vacuum full,普通vacuum会清理死元组,但是不会进行空间重组,磁盘上的空间不会释放,但是会释放死元组的空间,后续插入的元组会根据空闲空间管理fsm优先插入空闲空间。Vacuum full清理会释放磁盘空间,但是会获取八级锁,因为vacuum full的原理是新建一个表数据文件,然后从老表中拷贝数据到新文件中,这个过程会阻塞select。
因为影响业务,pg社区开发了pg_repack工具,老版本叫pg_reorg。Pg_repack以extension的方式存在,用户可以自己安装该插件。本文简单介绍一下pg_repack的使用。
Pg_repack的源码在github或者pgxn上都可以下载,这里编译安装、创建插件的过程就不再赘述。
在安装完后,就可以在操作系统命令行使用pg_repack命令了,下面列举了pg_repack的一些命令用法:
pg_repack --no-order --table test_1 test pg_repack --wait-timeout 3600 --jobs 10 --no-order -d test pg_repack --wait-timeout 3600 --jobs 10 --no-order --schema=test -d test pg_repack --wait-timeout 3600 --jobs 10 --only-indexes --table test.test_1 --no-order -d test pg_repack --wait-timeout 3600 --jobs 10 --index test.idx1 --no-order -d test
经过测试,在执行pg_repack的同时对表进行并发查询,性能下降大概只有10%到20%,读取操作可以正常进行。并且表的oid没有发生变化,repack执行完成后,通过pg_relation_filepath()函数查询发现表的数据文件发生了改变,同时会删除原来的数据文件,其实执行vacuum full数据文件也会发生改变。
test=# select pg_relation_filepath(16475); pg_relation_filepath ---------------------- base/16387/16580 (1 row) test=# select pg_relation_filepath(16475); pg_relation_filepath ---------------------- base/16387/16601 (1 row)
下面聊聊repack的原理吧,原理其实和vacuum full是类似的,都是新建一个文件,然后将老文件数据拷贝过来,然后进行文件切换,它不阻塞读写的秘诀就是新建文件和拷贝的过程是在线做的,在没有完成拷贝之前,原来的文件还是可以读写的,只有在切表那一瞬间可能会有影响。
那么它是怎么做到在线拷贝的呢?源库的数据文件一直在变,所以表文件其实分为两部分,一部分是基础数据,一部分是增量数据,基础数据的拷贝就是正常的拷贝,增量数据是通过创建触发器来捕获在该表上的读写操作来实现的,待基础数据拷贝完后再将trigger捕获的增量sql进行应用,达到最终结果。
我们其实可以发现,pg_repack会在库里创建名为repack的schema,里面有两张表:primary_keys和tables。Primarys分为两列,第一列indrelid代表表的oid,第二列indexrelid代表主键或唯一索引的oid。Tables表记录了创建trigger以及捕获的相关语句,语句按一条条的record进行记录。如下所示:
test=# select * from tables; -[ RECORD 1 ]-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- relname | public.products relid | 16388 reltoastrelid | 16391 reltoastidxid | 16393 schemaname | public pkid | ckid | create_pktype | create_log | CREATE TABLE repack.log_16388 (id bigserial PRIMARY KEY, pk repack.pk_16388, row public.products) create_trigger | enable_trigger | ALTER TABLE public.products ENABLE ALWAYS TRIGGER repack_trigger create_table_1 | CREATE TABLE repack.table_16388 WITH (oids = false) TABLESPACE tablespace_orig | pg_default create_table_2 | AS SELECT product_no,name,price FROM ONLY public.products copy_data | INSERT INTO repack.table_16388 SELECT product_no,name,price FROM ONLY public.products alter_col_storage | drop_columns | delete_log | DELETE FROM repack.log_16388 lock_table | LOCK TABLE public.products IN ACCESS EXCLUSIVE MODE ckey | sql_peek | SELECT * FROM repack.log_16388 ORDER BY id LIMIT $1 sql_insert | INSERT INTO repack.table_16388 VALUES ($1.*) sql_delete | sql_update | sql_pop | DELETE FROM repack.log_16388 WHERE id IN ( -[ RECORD 2 ]-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- relname | public.test relid | 16400 reltoastrelid | 0 reltoastidxid | 0 schemaname | public pkid | ckid | create_pktype | create_log | CREATE TABLE repack.log_16400 (id bigserial PRIMARY KEY, pk repack.pk_16400, row public.test) create_trigger | enable_trigger | ALTER TABLE public.test ENABLE ALWAYS TRIGGER repack_trigger create_table_1 | CREATE TABLE repack.table_16400 WITH (oids = false) TABLESPACE tablespace_orig | pg_default create_table_2 | AS SELECT id FROM ONLY public.test copy_data | INSERT INTO repack.table_16400 SELECT id FROM ONLY public.test alter_col_storage | drop_columns | delete_log | DELETE FROM repack.log_16400 lock_table | LOCK TABLE public.test IN ACCESS EXCLUSIVE MODE ckey | sql_peek | SELECT * FROM repack.log_16400 ORDER BY id LIMIT $1 sql_insert | INSERT INTO repack.table_16400 VALUES ($1.*) sql_delete | sql_update | sql_pop | DELETE FROM repack.log_16400 WHERE id IN ( -[ RECORD 3 ]-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- relname | public.fruits relid | 16394 reltoastrelid | 16397 reltoastidxid | 16399 schemaname | public pkid | ckid | create_pktype | create_log | CREATE TABLE repack.log_16394 (id bigserial PRIMARY KEY, pk repack.pk_16394, row public.fruits) create_trigger | enable_trigger | ALTER TABLE public.fruits ENABLE ALWAYS TRIGGER repack_trigger create_table_1 | CREATE TABLE repack.table_16394 WITH (oids = false) TABLESPACE tablespace_orig | pg_default create_table_2 | AS SELECT number,name,price FROM ONLY public.fruits copy_data | INSERT INTO repack.table_16394 SELECT number,name,price FROM ONLY public.fruits alter_col_storage | drop_columns | delete_log | DELETE FROM repack.log_16394 lock_table | LOCK TABLE public.fruits IN ACCESS EXCLUSIVE MODE ckey | sql_peek | SELECT * FROM repack.log_16394 ORDER BY id LIMIT $1 sql_insert | INSERT INTO repack.table_16394 VALUES ($1.*) sql_delete | sql_update | sql_pop | DELETE FROM repack.log_16394 WHERE id IN ( -[ RECORD 4 ]-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- relname | public.test_1 relid | 16475 reltoastrelid | 0 reltoastidxid | 0 schemaname | public pkid | 16493 ckid | create_pktype | CREATE TYPE repack.pk_16475 AS (c1 integer) create_log | CREATE TABLE repack.log_16475 (id bigserial PRIMARY KEY, pk repack.pk_16475, row public.test_1) create_trigger | CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON public.test_1 FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('INSERT INTO repack.log_16475(pk, row) VALUES( CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.c1)::repack.pk_16475) END, $2)') enable_trigger | ALTER TABLE public.test_1 ENABLE ALWAYS TRIGGER repack_trigger create_table_1 | CREATE TABLE repack.table_16475 WITH (oids = false) TABLESPACE tablespace_orig | pg_default create_table_2 | AS SELECT c1,c2,c3,c4,c5 FROM ONLY public.test_1 copy_data | INSERT INTO repack.table_16475 SELECT c1,c2,c3,c4,c5 FROM ONLY public.test_1 alter_col_storage | drop_columns | delete_log | DELETE FROM repack.log_16475 lock_table | LOCK TABLE public.test_1 IN ACCESS EXCLUSIVE MODE ckey | sql_peek | SELECT * FROM repack.log_16475 ORDER BY id LIMIT $1 sql_insert | INSERT INTO repack.table_16475 VALUES ($1.*) sql_delete | DELETE FROM repack.table_16475 WHERE (c1) = ($1.c1) sql_update | UPDATE repack.table_16475 SET (c1, c2, c3, c4, c5) = ($2.c1, $2.c2, $2.c3, $2.c4, $2.c5) WHERE (c1) = ($1.c1) sql_pop | DELETE FROM repack.log_16475 WHERE id IN (