gh-ost 學習筆記

2019 年 10 月 4 日
筆記

gh-ost 學習筆記

參考資料：

https://m.aliyun.com/yunqi/articles/62928

https://m.aliyun.com/yunqi/articles/115565?spm=5176.100239.0.0.bIL3Yr

https://yq.aliyun.com/articles/62928 【推薦】

https://github.com/github/gh-ost/tree/master/doc 【推薦】

https://github.com/wing324/helloworld_zh/blob/master/MySQL/gh-ost/GitHub%E5%BC%80%E6%BA%90MySQL%20Online%20DDL%E5%B7%A5%E5%85%B7gh-ost%E5%8F%82%E6%95%B0%E8%A7%A3%E6%9E%90.md

http://blog.csdn.net/qq_36281740/article/details/52711284

下面是一個添加字段的 general_log 截圖：

grant ALL on *.* to ptosc@'10.0.20.%' identified by 'ptosc'; — 添加gh-ost操作的授權

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 -allow-on-master -database="hellodb" -debug -table="coc" -alter "add column cl111 char(20) not null" -assume-rbr -execute –initially-drop-old-table –ok-to-drop-table

部分表說明：

coc 原始表

_coc_del 原先的原始表，alter操作完成時，會自動將這個表重命名為_coc_del 【安全起見，默認gh-ost不會刪除這個文件】

_coc_gho 影子表，alter操作完成時，會自動將這個表重命名為coc

_coc_ghc 記錄操作進度等信息，其表中的內容類似下圖：

過程 https://github.com/github/gh-ost ：

連接到主庫的大致的操作過程如下：

1、gh-ost程序連接到主庫，檢查賬號權限，mysqld版本，估算需要操作的表的行數。

2、初始化stream連接，監聽binlog，並記錄gh-ost操作前的binlog位置。初始化Applier連接。

3、刪除_xxx_ghc表(如果之前存在)，重建這個ghc表，用於記錄gh-ost的操作changelog。

3、創建_xxx_gho表(最終表)，對gho表進行ALTER操作，然後複製原表數據到gho表。（這ALTER期間，gh-ost模擬成slave，將操作期間的DML產生的binlog event獲取到，並應用到gho表上）

4、【cut-over新舊錶切換階段】重命名xxx表為_coc_del表，重命名_xxx_gho表為coc表【rename表的瞬間是鎖表的】

5、刪除_xxx_ghc表

6、刪除_xxx_del表 (默認不會刪除原始表，需要加-ok-to-drop-table參數)

遷移和切換的細節實現：

關於gh-ost的實現，這裡只挑了rowcopy和binlog apply的順序問題和rename過程做了詳細解析。

數據遷移過程

在數據遷移的過程中，數據變量有三個，暫且分為，A:來自原表的rowcopy，B:binlog的apply，C:對原表的dml操作。

C操作會記錄binglog從而觸發B操作，所以B操作一定在C操作的後面，因此一般情況下，會有ACB,CBA兩種組合，同時特殊情況如binlog apply延遲，則會有CAB這種組合。

分析三種組合之前要先了解gh-ost在sql改寫方面是如何映射的：

RowCopy	原表操作	新表操作
	select	insert ignore into

BinlogApply	原表操作	新表操作
	insert	replace into
	update	update 新表(全行更新)
	delete	delete

在上述原則的基礎上，我們再來逐個分析不同順序組合的影響:

1.insert 操作

binlog是最權威的,gh-ost的原則是以binlog優先,所以無論任何順序下,數據都是和binlog保持一致,如果rowcopy在後,會insert ignore,如果binlog apply在後會replace into掉。

2.update/delete 操作

一般情況下：

ACB組合,即對已經rowcopy過的數據，出現對原表的update/delete操作。這時候會全部通過binlog apply執行，注意binlog apply的update是對某一條記錄的全部列覆蓋更新，所以不會有累加的問題。

CBA組合,即對尚未遷移的數據，出現對原表的update/delete操作。這時候對新表的binlog apply會是空操作，具體數據由rowcopy遷移。

特殊情況下:

CAB組合,即先對原表更新完以後，rowcopy在binlog apply之前把數據遷移了過去，而在binlog event過來以後，會再次應用，這裡有問題？其實結合gh-ost的binlog aplly的sql映射規則,insert操作會被replace重新替換掉,update 會更新對應記錄全部行,delete 會是空操作。最終數據還是一致的狀態。

cut-over過程：

在pt-osc或者online ddl中,最後的rename操作一般是耗時比較短,但如果表結構變更過程中,有大查詢進來,那麼在rename操作的時候,會觸發MDL鎖的等待,如果在高峰期,這就是個嚴重的問題。所以gh-ost是怎麼做的呢？

gh-ost利用了MySQL的一個特性，就是原子性的rename請求，在所有被blocked的請求中，優先級永遠是最高的。gh-ost基於此設計了該方案：一個連接對原表加鎖，另啟一個連接嘗試rename操作，此時會被阻塞住，當釋放lock的時候，rename會首先被執行，其他被阻塞的請求會繼續應用到新表。

migrator.go:iterateChunks() 函數來確定何時開始 cut－over

具體切換流程如下：

START

會話A
1. CREATE table tbl_old

防止rename過早執行

LOCK TABLES tbl WRITE, tbl_old WRITE

通過lock_wait_timeout設置為2s控制超時,超時失敗會重試次數為配置default-retries,默認60次

新的請求進來，關於原表的請求被blocked
RENAME TABLE tbl TO tbl_old, ghost TO tbl , 同樣被blocked
新的請求進來，關於原表的請求被blocked
檢查是否有blocked 的RENAME請求，通過show processlist
會話A: DROP TABLE tbl_old
會話A: UNLOCK TABLES
RENAME SUCCESS

session1	session2
begin	begin;
lock tables `t1` write, `_t1_del` write	rename table `t1` to `_t1_del`, `_t1_gho` to `t1` — 會被阻塞
drop table if exists `_t1_del`
unlock tables;
	session1釋放鎖後，session會 rename 成功

不同階段失敗後如何處理：

如果第一步失敗，退出程序
如果會話A建表成功，加鎖失敗，退出程序，未加鎖
rename請求來的時候，會話A死掉，lock會自動釋放，同時因為tbl_old的存在rename也會失敗，所有請求恢復正常
rename被blocked的時候，會話A死掉，lock會自動釋放，同樣因為tbl_old的存在，rename會失敗，所有請求恢復正常
rename死掉，gh-ost會捕獲不到rename，會話A繼續運行，釋放lock，所有請求恢復正常

gh-ost要的權限比較大，需要嚴格控制訪問的IP

grant ALL on *.* to ptosc@'10.0.20.%' identified by 'ptosc'; # 如果gh-ost只在主庫通過SSH終端執行的話，可以給本機地址授權即可。

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 -allow-on-master -database="hellodb" -debug -table="coc" -alter "add column cl1 varchar(20) not null default "test col1" " -assume-rbr -ok-to-drop-table -serve-socket-file="/tmp/gh-ost.hellodb.coc.sock" -execute

gh-ost有三種工作模式：

a:連接到從庫，在主庫做遷移。

b:連接到主庫，遷移過程所有操作都在主上操作，包括讀取binlog等等。【推薦使用】

c:在從庫做遷移測試。

三種模式對比：

三種方法各有優缺點，但我只關心缺點，先說a的缺點，a會在從上面讀取binlog,但數據庫主從數據為什麼會造成不一致，一個很重要的原因是主庫的binlog沒有完全在從庫執行。所以個人感覺a方法有丟失數據的風險。

b方法任何操作都會再主庫操作，或多或少會對主庫負載造成影響，但是可以通過調整一些參數降低和時刻關注這些影響，所以個人推薦使用b方法。

至於c方法是偏向測試用的，這裡不做過多介紹，但是c方法里有一個細節，cut-over階段有會stop slave一個操作,其實這個操作風險特別高，有時stop slave 時間會很長，務必會對線上數據庫使用造成影響，所以如果使用c方法做測試也要在線下數據庫。

gh-ost 參數詳解：

-allow-master-master 允許在雙主環境下使用，貌似現在生產環境雙主用的比較少，一般都是MHA

explicitly allow running in a master-master setup

-allow-nullable-unique-key 允許gh-ost在數據遷移(migrate)依賴的唯一鍵可以為NULL，默認為不允許為NULL的唯一鍵。如果數據遷移(migrate)依賴的唯一鍵允許NULL值，則可能造成數據不正確，請謹慎使用。

allow gh-ost to migrate based on a unique key with nullable columns. As long as no NULL values exist, this should be OK. If NULL values exist in chosen key, data may be corrupted. Use at your own risk!

-allow-on-master 允許在主庫執行【gh-ost默認會去從庫執行命令，這個選項會強制在主庫執行，推薦這種方式使用gh-ost】

allow this migration to run directly on master. Preferably it would run on a replica

-alter string 這裡跟具體的alter語句【需要用雙引號或者單引號括起來】

alter statement (mandatory)

-approve-renamed-columns ALTER 給列改名的話，必須加這個選項

in case your ALTER statement renames columns, gh-ost will note that and offer its interpretation of the rename. By default gh-ost does not proceed to execute. This flag approves that gh-ost's interpretation is correct

例如：

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306 -allow-on-master -database="hellodb" -table="coc" -approve-renamed-columns -alter "change column c1 newc1 varchar(100) not null default "測試字段1" COMMENT "測試注釋1"" -assume-rbr -execute -ok-to-drop-table 操作完成後刪除舊錶

-assume-master-host string

(optional) explicitly tell gh-ost the identity of the master. Format:some.host.com[:port] This is useful in master-master setups where you wish to pick an explicit master, or in a tungsten-replicator where gh-ost is unabel to determine the master

-assume-rbr 確認gh-ost連接的數據庫實例的binlog_format=ROW的情況下，可以指定-assume-rbr，這樣可以禁止從庫上運行stop slave,start slave,執行gh-ost用戶也不需要SUPER權限。

set to 'true' when you know for certain your server uses 'ROW' binlog_format. gh-ost is unable to tell, event after reading binlog_format, whether the replication process does indeed use 'ROW', and restarts replication to be certain RBR setting is applied. Such operation requires SUPER privileges which you might not have. Setting this flag avoids restarting replication and you can proceed to use gh-ost without SUPER privileges

-check-flag 檢查某個參數是否支持？？

Check if another flag exists/supported. This allows for cross-version scripting. Exits with 0 when all additional provided flags exist, nonzero otherwise. You must provide (dummy) values for flags that require a value. Example: gh-ost –check-flag –cut-over-lock-timeout-seconds –nice-ratio 0

-chunk-size int 在每次迭代中處理的行數量(允許範圍：100-100000)，默認值為1000

amount of rows to handle in each iteration (allowed range: 100-100,000) (default 1000)

-concurrent-rowcount 該參數如果為True(默認值)，則進行row-copy之後，估算統計行數(使用explain select count(*)方式)，並調整ETA時間，否則，gh-ost首先預估統計行數，然後開始row-copy。

(with –exact-rowcount), when true (default): count rows after row-copy begins, concurrently, and adjust row estimate later on; when false: first count rows, then start row copy (default true)

-conf string 可以指定gh-ost默認去讀取配置的文件【一般直接在命令行指定DDL使用的選項即可】

-critical-load string 一系列逗號分隔的status-name=values組成，當MySQL中status超過對應的values，gh-ost將會退出。【用的較少】

Comma delimited status-name=threshold, same format as –max-load. When status exceeds threshold, app panics and quits

例如: -critical-load "Threads_running=1000,threads_connected=5000" 指的是當MySQL中的狀態值Threads_running>1000,threads_connected>5000的時候，gh-ost將會由於該數據庫嚴重負載而停止並退出。

-critical-load-hibernate-seconds int 【用的較少】

When nonzero, critical-load does not panic and bail out; instead, gh-ost goes into hibernate for the specified duration. It will not read/write anything to from/to any server

-critical-load-interval-millis int 【用的較少】

When 0, migration immediately bails out upon meeting critical-load. When non-zero, a second check is done after given interval, and migration only bails out if 2nd check still meets critical load 。當值為0時，當達到-critical-load，gh-ost立即退出。當值不為0時，當達到-critical-load，gh-ost會在-critical-load-interval-millis秒數後，再次進行檢查，再次檢查依舊達到-critical-load，gh-ost將會退出。

-cut-over string 選擇cut-over類型:atomic/two-step，atomic(默認)類型是github的算法，two-step採用的是facebook-OSC的算法。

choose cut-over type (default|atomic, two-step) (default "atomic")

-cut-over-lock-timeout-seconds int gh-ost在cut-over階段最大的鎖等待時間，當鎖超時時，gh-ost的cut-over將重試。(默認值：3)

Max number of seconds to hold locks on tables while attempting to cut-over (retry attempted when lock exceeds timeout) (default 3)

-database string 指定需要處理的庫名

-debug 設置日誌的級別

-default-retries int 各種操作在panick前重試次數。(默認為60次)

Default number of retries for various operations before panicking (default 60)

-discard-foreign-keys 【危險參數】對一個有外鍵的表，帶該參數執行gh-ost後，外鍵就會丟失。該參數很適合用於刪除外鍵，除此之外，請謹慎使用

DANGER! This flag will migrate a table that has foreign keys and will NOT create foreign keys on the ghost table, thus your altered table will have NO foreign keys. This is useful for intentional dropping of foreign keys

-dml-batch-size int 待確認具體的含義

batch size for DML events to apply in a single transaction (range 1-100) (default 10)

-exact-rowcount 使用count(*)計算錶行數，而不是從information_schema表裡估計行數（用於更精確的進度估算）

actually count table rows as opposed to estimate them (results in more accurate progress estimation)

-execute 真實執行ALTER操作，不加的話就是dry-run測試運行

-force-named-cut-over

When true, the 'unpostpone|cut-over' interactive command must name the migrated table

-force-table-names string

table name prefix to be used on the temporary tables

-heartbeat-interval-millis int gh-ost心跳頻率值，默認為100秒

how frequently would gh-ost inject a heartbeat value (default 100)

-hooks-hint string

arbitrary message to be injected to hooks via GH_OST_HOOKS_HINT, for your convenience

-hooks-path string

directory where hook files are found (default: empty, ie. hooks disabled). Hook files found on this path, and conforming to hook naming conventions will be executed

-host string 主機地址 MySQL hostname (preferably a replica, not the master) (default "127.0.0.1")

-initially-drop-ghost-table 執行gh-ost時候，先刪除同名gho表(如果存在的話）【危險，慎用】

Drop a possibly existing Ghost table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists

-initially-drop-old-table gh-ost操作之前，檢查並刪除已經存在的舊錶。【危險，慎用】

Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists

-initially-drop-socket-file gh-ost強制刪除已經存在的socket文件。該參數不建議使用，可能會刪除一個正在運行的gh-ost程序，導致DDL失敗。【危險，慎用】

Should gh-ost forcibly delete an existing socket file. Be careful: this might drop the socket file of a running migration!

-master-password string

MySQL password on master, if different from that on replica. Requires –assume-master-host

-master-user string

MySQL user on master, if different from that on replica. Requires –assume-master-host

-user string MySQL用戶名

-max-lag-millis int 主從複製最大延遲時間，當主從複製延遲時間超過該值後，gh-ost將採取節流(throttle)措施，默認值：1500s

replication lag at which to throttle operation (default 1500)

-max-load string 一系列逗號分隔的status-name=values組成，當MySQL中status超過對應的values，gh-ost將採取節流(throttle)措施。

Comma delimited status-name=threshold. e.g: 'Threads_running=100,Threads_connected=500'. When status exceeds threshold, app throttles writes

例如： -max-load "Threads_running=20,Connections=1500" 指的是當MySQL中的狀態值Threads_running>20,Connections>1500的時候，gh-ost將採取節流(throttle)措施。

-migrate-on-replica 在從庫上執行gh-ost命令

Have the migration run on a replica, not on the master. This will do the full migration on the replica including cut-over (as opposed to –test-on-replica)

-nice-ratio float 每次chunk時間段的休眠時間。【0：每個chunk時間段不休眠；1：每row-copy 1毫秒，則另外休眠1毫秒；0.7：每row-copy 10毫秒，則另外休眠7毫秒】

force being 'nice', imply sleep time per chunk time; range: [0.0..100.0]. Example values: 0 is aggressive. 1: for every 1ms spent copying rows, sleep additional 1ms (effectively doubling runtime); 0.7: for every 10ms spend in a rowcopy chunk, spend 7ms sleeping immediately after

-ok-to-drop-table gh-ost操作結束後，刪除舊錶，默認狀態是不刪除舊錶，會存在_tablename_del表

Shall the tool drop the old table at end of operation. DROPping tables can be a long locking operation, which is why I'm not doing it by default. I'm an online tool, yes?

-panic-flag-file string 當這個文件被創建，gh-ost將會立即退出，且不會清理產生的臨時文件及socket文件【慎用】

when this file is created, gh-ost will immediately terminate, without cleanup

-password string MySQL密碼

-port int MySQL端口

-postpone-cut-over-flag-file string 當這個文件存在的時候，gh-ost的cut-over階段將會被推遲，直到該文件被刪除。

while this file exists, migration will postpone the final stage of swapping tables, and will keep on syncing the ghost table. Cut-over/swapping would be ready to perform the moment the file is deleted.

-quiet 靜默模式

-replica-server-id uint

server id used by gh-ost process. Default: 99999 (default 99999)

-replication-lag-query string

Deprecated. gh-ost uses an internal, subsecond resolution query

-serve-socket-file string

Unix socket file to serve on. Default: auto-determined and advertised upon startup

-serve-tcp-port int

TCP port to serve on. Default: disabled

-skip-foreign-key-checks 跳過外鍵檢查，除非你確認待操作的表沒有外鍵【危險，慎用】

set to 'true' when you know for certain there are no foreign keys on your table, and wish to skip the time it takes for gh-ost to verify that

-skip-renamed-columns ALTER 該參數告訴gh-ost跳該列的數據遷移，讓gh-ost把重命名列作為無關緊要的列。該操作很危險，你會損失該列的所有值。【危險，慎用】

in case your ALTER statement renames columns, gh-ost will note that and offer its interpretation of the rename. By default gh-ost does not proceed to execute. This flag tells gh-ost to skip the renamed columns, i.e. to treat what gh-ost thinks are renamed columns as unrelated columns. NOTE: you may lose column data

例如：gh-ost -user="wing" -host="127.0.0.1" -port=3306 -database="wing" -table="t" -password="wing" -alter="change column c1 col1 int not null default 0" -assume-rbr -execute -skip-renamed-columns ALTER 這個ALTER操作會丟棄到c1列的原先全部數據

-stack 添加錯誤堆棧追蹤

-switch-to-rbr 讓gh-ost自動將從庫的binlog_format轉換為ROW格式【生產環境都是用row格式了，基本上不用加這個參數】

let this tool automatically switch binary log format to 'ROW' on the replica, if needed. The format will NOT be switched back. I'm too scared to do that, and wish to protect you if you happen to execute another migration while this one is running

-table string 待ALTER操作的表名

-test-on-replica 在從庫上測試gh-ost，包括在從庫上數據遷移(migration)，數據遷移完成後stop slave，原表和ghost表立刻交換而後立刻交換回來。繼續保持stop slave，使你可以對比兩張表。

Have the migration run on a replica, not on the master. At the end of migration replication is stopped, and tables are swapped and immediately swap-revert. Replication remains stopped and you can compare the two tables for building trust

-test-on-replica-skip-replica-stop 當-test-on-replica執行時，該參數表示該過程中不用stop slave。

When –test-on-replica is enabled, do not issue commands stop replication (requires –test-on-replica)

-throttle-additional-flag-file string

當該文件被創建後，gh-ost操作立即停止。該參數可以用在多個gh-ost同時操作的時候，創建一個文件，讓所有的gh-ost操作停止，或者刪除這個文件，讓所有的gh-ost操作恢復。

-throttle-control-replicas string

列出所有需要被檢查slave lag的從庫，如：-throttle-control-replica=192.16.12.22:3306,192.16.12.23:3307,192.16.13.12:3308

-throttle-flag-file string

當該文件被創建後，gh-ost操作立即停止。該參數適合控制單個gh-ost操作。

-throttle-http string

when given, gh-ost checks given URL via HEAD request; any response code other than 200 (OK) causes throttling; make sure it has low latency response

-throttle-query string 節流查詢。每秒鐘執行一次。當返回值=0時不需要節流，當返回值>0時，需要執行節流操作。該查詢會在數據遷移(migrated)服務器上操作，所以請確保該查詢是輕量級的。

when given, issued (every second) to check if operation should throttle. Expecting to return zero for no-throttle, >0 for throttle. Query is issued on the migrated server. Make sure this query is lightweight

-timestamp-old-table

Use a timestamp in old table name. This makes old table names unique and non conflicting cross migrations

-tungsten 告訴gh-ost你正在運行的是一個tungsten-replication拓撲結構。

explicitly let gh-ost know that you are running on a tungsten-replication based topology (you are likely to also provide –assume-master-host)

-verbose gh-ost執行時候輸出詳細信息

示例：

主庫：10.0.20.25

從庫：10.0.20.26

在主庫添加用於alter操作的賬號：grant ALL on *.* to ptosc@'10.0.20.25' identified by 'ptosc' ;

### 注意：下面的命令都是在主庫執行的。參數裏面 – 和– 的一樣用，沒任何區別。 另外，操作的表上必須要有主鍵。

./gh-ost -version 使用的是1.0.42版本。

添加字段：

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306   -max-load=Threads_running=25   -critical-load=Threads_running=1000   -chunk-size=1000   -allow-on-master --verbose -database="hellodb" -table="coc"   -alter " add column c1 varchar(100) not null default "測試字段1" COMMENT "測試注釋1""   -assume-rbr -ok-to-drop-table -execute

修改某個字段的字符集：

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306   -allow-on-master --verbose -database="hellodb" -table="coc"   -alter "change c1 c1  varchar(128) CHARACTER SET utf8mb4"   -assume-rbr -ok-to-drop-table -execute

刪除字段：

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306   -allow-on-master --verbose -database="hellodb" -table="coc"   -alter "drop column c1"   -assume-rbr -ok-to-drop-table -execute

字段改名：

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306   -allow-on-master --verbose -database="hellodb" -table="coc"   -approve-renamed-columns   -alter "change column c1 newc1 varchar(100) not null default "測試字段1" COMMENT "測試注釋1""   -assume-rbr -ok-to-drop-table -execute

添加索引：

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306   -allow-on-master --verbose  -database="hellodb" -table="coc"   -alter "add index idx_newc1_classid(newc1,classid)"   -assume-rbr -ok-to-drop-table -execute

刪除索引：

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306   -allow-on-master --verbose -database="hellodb" -table="coc"   -alter "drop index idx_newc1_classid"   -assume-rbr -ok-to-drop-table -execute

修改表引擎：

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306   -allow-on-master --verbose -database="hellodb" -table="coc"   -alter "engine=InnoDB"   -assume-rbr -ok-to-drop-table -execute

刪除字段的同時不要刪除老表：

./gh-ost -user="ptosc" -password="ptosc" -host="10.0.20.25" -port=3306   -allow-on-master --verbose -database="hellodb" -table="coc"   -alter "drop column newc1"   -assume-rbr -execute