MySQL啟動過程詳解三:Innodb存儲引擎的啟動
Innodb啟動過程如下:
1. 初始化innobase_hton,它是一個handlerton類型的指針,以便在server層能夠調用存儲引擎的介面。
2. Innodb相關參數的檢車和初始化,包括系統表空間,臨時表空間,undo表空間,redo文件,doublewrite文件等。
3. innobase_start_or_create_for_mysql()創建或者啟動 innobase。
innobase_start_or_create_for_mysql() 過程如下:
1. 重置 start state.
2. 處理 innodb_flush_method, 一般情況下,線上使用 O_DIRECT | O_DIRECT_NO_FSYNC
3. 設置 Innodb 最大執行緒數量
4. 重置 innodb_buffer_pool_instances 和 innodb_buffer_pool_size
5. 根據 srv_buf_pool_instances 調整 innodb_page_cleaners 的數量
6. 啟動innodb server, 進行相關參數和組件的初始化。
7. 初始化非同步IO子系統
8. 創建 innodb_buffer_pool, 當沒有足夠的記憶體時會報錯
9. 調用 fsp_init 和 log_init, 初始化 fsp 系統 & redo log 系統
10. 調用recv_sys_create和recv_sys_init函數,創建及初始化recovery系統
11. 調用 lock_sys_create函數,創建鎖系統
12. 調用 os_thread_create 函數,創建 IO 執行緒
13. 調用 buf_flush_page_cleaner_init 函數,初始化 page_cleaner 系統,而後創建 buf_flush_page_cleaner_coordinator 和 buf_flush_page_cleaner_worker 執行緒
14. 等待 page_cleaner 變為 active 狀態。
15. 調用 check_file_spec函數,檢查數據文件是否存在, ibdata1 ibdata2 等等, 判斷是否需要創建新的資料庫
16. 如果需要創建新的資料庫, 則檢查是否存在 redo log file 和 undo 表空間
17. 調用 srv_sys_space.open_or_create(), 打開或創建新的數據文件[ibdata..],如果不是創建新的資料庫,則從 ibdata1文件中讀取 flushed_lsn
18. 這裡如果是 create_new_db,則:
18.1 從所有緩衝池的 flush list 的尾部同步flush髒的數據頁
18.2 獲取當前 lsn
18.3 創建 redo log 文件
19. 如果是 !create_new_db,則打開 redo log file
20. 調用 fil_space_create函數,創建 redo log 記憶體中的空間對象
21. 添加redo log file 文件到 redo log space 中
22. 初始化 redo log group 日誌組
23. 調用 fil_open_log_and_system_tablespace_files,打開所有日誌文件和系統表空間數據文件
24. 調用 srv_undo_tablespaces_init,打開 undo 表空間, 在找到並打開所有的 undo 文件之後, 將他們全部加入文件管理系統
25. 調用trx_sys_file_format_init函數,初始化變數file_format_max
26. 創建 trx_sys instance 並初始化 purge_queue 和 mutex
27. 如果 create_new_db,則:
27.1 調用 fsp_header_init,在 ibdata 文件的開始分配空間,以便可以存儲管理一些系統模組,如事務系統等
27.2 調用 trx_sys_create_sys_pages,創建事務系統的文件頁,在ibdata中的第6個頁面。
27.3 調用 trx_sys_init_at_db_start,創建並初始化事務系統記憶體結構。
27.4 調用 trx_purge_sys_create,創建並初始化 trx purge 系統
27.5 調用 dict_create, 創建新的數據字典並初始化 change buf
28. 使整個緩衝池無效, 來確保在 recovery的過程中我們重新讀取之前讀取的頁。這是一個很輕量級的操作, 此時再 LRU 列表中只有一個數據頁, 在 flush 列表中沒有任何數據頁。
29. 調用 recv_recovery_from_checkpoint_start(),開始 recovery 操作
29.1 初始化 flush 紅黑樹, 以便在恢復的過程中快速插入 flush 列表。
29.2 在 log groups 中查找 latest checkpoint
29.3 讀取 latest checkpoint 所在的 redo log 頁到 log_sys->checkpoint_buf中
29.4 獲取 checkpoint_lsn 和 checkpoint_no
29.5 從 checkpoing_lsn 讀取 redo log 到 hash 表中。
29.6 檢查 crash recovery 所需的表空間, 處理並刪除double write buf 中的數據頁, 這裡會檢查double write buf 中頁對應的真實數據頁的完整性, 如果有問題, 則使用 double write buf 中頁進行恢復。同時, 生成後台執行緒 recv_writer_thread 以清理緩衝池中的臟頁。
29.7 將日誌段從最新的日誌組複製到其他組, 我們目前只有一個日誌組。
30. 清除 double write buf 中的數據頁
30. 調用 dict_boot, 初始化數據字典系統和change_buf
31. 調用trx_sys_init_at_db_start,創建並初始化事務系統
32. 調用 recv_apply_hashed_log_recs,應用 redo log
33. 調用trx_purge_sys_create,創建 trx_purge sys
34. 調用recv_recovery_from_checkpoint_finish,從一個 checkpoint 位置完成 recovery 操作
34.1 確保 recv_writer 執行緒已完成
34.2 等待 flush 操作完成, flush臟頁操作已經完成
34.3 等待 recv_writer 執行緒終止
34.4 釋放 flush 紅黑樹
34.5 回滾所有的數據字典表的事務,以便數據字典表沒有被鎖定。數據字典 latch 應保證一次只有一個數據字典事務處於活躍狀態。
35. 調用recv_recovery_rollback_active,回滾未在Innodb中提交的不完整的事務,這是在一個後台執行緒中進行中
36. 調用 srv_open_tmp_tablespace,打開臨時表空間
37. 調用trx_sys_create_rsegs,創建回滾段
38. 創建鎖等待超時執行緒,執行緒函數為lock_wait_timeout_thread。
39. 創建訊號量超時監控執行緒,當訊號量等待持續過長的時間時,列印警告資訊,執行緒函數為srv_error_monitor_thread。
40. 創建 master thread,執行緒函數為 srv_master_thread
41. 創建 purge 系統執行緒,srv_purge_coordinator_thread 和 srv_worker_thread 執行緒
42. srv_start_wait_for_purge_to_start,等待 purge 系統啟動
43. 創建buffer pool dump/load執行緒,執行緒函數為buf_dump_thread
44. 創建統計資訊收集執行緒,執行緒函數為dict_stats_thread
45. 調用函數fts_optimize_init,創建優化執行緒,執行緒函數為fts_optimize_thread
46. 創建buffer pool size動態調整執行緒,執行緒函數為buf_resize_thread。
Innodb存儲引擎的啟動程式碼是在 ha_innodb.cc 的 innobase_init() 方法中,其源碼如下:
/*********************************************************************//** 初始化Innodb 插件 Opens an InnoDB database. @return 0 on success, 1 on failure */ static int innobase_init( /*==========*/ void *p) /*!< in: InnoDB handlerton */ { static char current_dir[3]; /*!< Set if using current lib */ int err; char *default_path; uint format_id; ulong num_pll_degree; // 初始化 innobase_hton,以便在server層能夠調用Innodb的介面 DBUG_ENTER("innobase_init"); handlerton* innobase_hton= (handlerton*) p; innodb_hton_ptr = innobase_hton; innobase_hton->state = SHOW_OPTION_YES; innobase_hton->db_type = DB_TYPE_INNODB; innobase_hton->savepoint_offset = sizeof(trx_named_savept_t); innobase_hton->close_connection = innobase_close_connection; innobase_hton->kill_connection = innobase_kill_connection; innobase_hton->savepoint_set = innobase_savepoint; innobase_hton->savepoint_rollback = innobase_rollback_to_savepoint; innobase_hton->savepoint_rollback_can_release_mdl = innobase_rollback_to_savepoint_can_release_mdl; innobase_hton->savepoint_release = innobase_release_savepoint; innobase_hton->commit = innobase_commit; innobase_hton->rollback = innobase_rollback; innobase_hton->prepare = innobase_xa_prepare; innobase_hton->recover = innobase_xa_recover; innobase_hton->commit_by_xid = innobase_commit_by_xid; innobase_hton->rollback_by_xid = innobase_rollback_by_xid; innobase_hton->create = innobase_create_handler; innobase_hton->alter_tablespace = innobase_alter_tablespace; innobase_hton->drop_database = innobase_drop_database; innobase_hton->panic = innobase_end; innobase_hton->partition_flags= innobase_partition_flags; innobase_hton->start_consistent_snapshot = innobase_start_trx_and_assign_read_view; innobase_hton->flush_logs = innobase_flush_logs; innobase_hton->show_status = innobase_show_status; innobase_hton->fill_is_table = innobase_fill_i_s_table; innobase_hton->flags = HTON_SUPPORTS_EXTENDED_KEYS | HTON_SUPPORTS_FOREIGN_KEYS | HTON_SUPPORTS_TABLE_ENCRYPTION; innobase_hton->release_temporary_latches = innobase_release_temporary_latches; innobase_hton->replace_native_transaction_in_thd = innodb_replace_trx_in_thd; innobase_hton->data = &innodb_api_cb; innobase_hton->is_reserved_db_name= innobase_check_reserved_file_name; innobase_hton->is_supported_system_table= innobase_is_supported_system_table; innobase_hton->rotate_encryption_master_key = innobase_encryption_key_rotation; ut_a(DATA_MYSQL_TRUE_VARCHAR == (ulint)MYSQL_TYPE_VARCHAR); #ifndef NDEBUG static const char test_filename[] = "-@"; char test_tablename[sizeof test_filename + sizeof(srv_mysql50_table_name_prefix) - 1]; if ((sizeof(test_tablename)) - 1 != filename_to_tablename(test_filename, test_tablename, sizeof(test_tablename), true) || strncmp(test_tablename, srv_mysql50_table_name_prefix, sizeof(srv_mysql50_table_name_prefix) - 1) || strcmp(test_tablename + sizeof(srv_mysql50_table_name_prefix) - 1, test_filename)) { sql_print_error("tablename encoding has been changed"); DBUG_RETURN(innobase_init_abort()); } #endif /* NDEBUG */ /* Check that values don't overflow on 32-bit systems. */ if (sizeof(ulint) == 4) { if (innobase_buffer_pool_size > UINT_MAX32) { sql_print_error( "innodb_buffer_pool_size can't be over 4GB" " on 32-bit systems"); DBUG_RETURN(innobase_init_abort()); } } os_file_set_umask(my_umask); /* Setup the memory alloc/free tracing mechanisms before calling any functions that could possibly allocate memory. */ ut_new_boot(); /* First calculate the default path for innodb_data_home_dir etc., in case the user has not given any value. Note that when using the embedded server, the datadirectory is not necessarily the current directory of this program. */ if (mysqld_embedded) { default_path = mysql_real_data_home; } else { /* It's better to use current lib, to keep paths short */ current_dir[0] = FN_CURLIB; current_dir[1] = FN_LIBCHAR; current_dir[2] = 0; default_path = current_dir; } ut_a(default_path); fil_path_to_mysql_datadir = default_path; folder_mysql_datadir = fil_path_to_mysql_datadir; /* Set InnoDB initialization parameters according to the values read from MySQL .cnf file */ /* The default dir for data files is the datadir of MySQL 默認的數據文件目錄 */ srv_data_home = innobase_data_home_dir ? innobase_data_home_dir : default_path; /*--------------- Shared tablespaces ------------------------- 共享表空間, 分為系統表空間和臨時共享表空間 */ /* Check that the value of system variable innodb_page_size was set correctly. Its value was put into srv_page_size. If valid, return the associated srv_page_size_shift. */ // 檢查系統變數 innodb_page_size 的值。 srv_page_size_shift = innodb_page_size_validate(srv_page_size); if (!srv_page_size_shift) { sql_print_error("InnoDB: Invalid page size=%lu.\n", srv_page_size); DBUG_RETURN(innobase_init_abort()); } /* Set default InnoDB temp data file size to 12 MB and let it be auto-extending. 設置默認的 Innodb 數據文件大小為12MB,並設置其自動增長。 */ if (!innobase_data_file_path) { innobase_data_file_path = (char*) "ibdata1:12M:autoextend"; } /* This is the first time univ_page_size is used. It was initialized to 16k pages before srv_page_size was set univ_page_size 被初始化為 16k. */ univ_page_size.copy_from( page_size_t(srv_page_size, srv_page_size, false)); // 設置系統表空間的 space_id srv_sys_space.set_space_id(TRX_SYS_SPACE); /* Create the filespace flags. 設置系統表空間 filespace_flags\name\path */ ulint fsp_flags = fsp_flags_init( univ_page_size, false, false, false, false); srv_sys_space.set_flags(fsp_flags); srv_sys_space.set_name(reserved_system_space_name); srv_sys_space.set_path(srv_data_home); /* Supports raw devices 支援 raw devices */ if (!srv_sys_space.parse_params(innobase_data_file_path, true)) { ib::error() << "Unable to parse innodb_data_file_path=" << innobase_data_file_path; DBUG_RETURN(innobase_init_abort()); } /* Set default InnoDB temp data file size to 12 MB and let it be auto-extending. 設置默認的 Innodb temp 數據文件大小為 12MB 並自動增長。 */ if (!innobase_temp_data_file_path) { innobase_temp_data_file_path = (char*) "ibtmp1:12M:autoextend"; } /* We set the temporary tablspace id later, after recovery. The temp tablespace doesn't support raw devices. Set the name and path. 在這裡設置臨時表空間 name 和 path,臨時表空間不支援原始設備。 在 recovery 之後設置臨時表空間id。 */ srv_tmp_space.set_name(reserved_temporary_space_name); srv_tmp_space.set_path(srv_data_home); /* Create the filespace flags with the temp flag set. 設置臨時表空間的 filespace_flags. */ fsp_flags = fsp_flags_init( univ_page_size, false, false, false, true); srv_tmp_space.set_flags(fsp_flags); if (!srv_tmp_space.parse_params(innobase_temp_data_file_path, false)) { ib::error() << "Unable to parse innodb_temp_data_file_path=" << innobase_temp_data_file_path; DBUG_RETURN(innobase_init_abort()); } /* Perform all sanity check before we take action of deleting files*/ // 檢查系統表空間和臨時表空間是否有公共 data file. if (srv_sys_space.intersection(&srv_tmp_space)) { sql_print_error("%s and %s file names seem to be the same.", srv_tmp_space.name(), srv_sys_space.name()); DBUG_RETURN(innobase_init_abort()); } /* ------------ UNDO tablespaces files --------------------- undo 表空間。 */ // undo表空間dir if (!srv_undo_dir) { srv_undo_dir = default_path; } // 規範 undo 表空間目錄 os_normalize_path(srv_undo_dir); if (strchr(srv_undo_dir, ';')) { sql_print_error("syntax error in innodb_undo_directory"); DBUG_RETURN(innobase_init_abort()); } /* -------------- All log files --------------------------- 所有的日誌文件 */ /* The default dir for log files is the datadir of MySQL 默認redo log 目錄 */ // 默認 redo log group dir if (!srv_log_group_home_dir) { srv_log_group_home_dir = default_path; } // 規範目錄 os_normalize_path(srv_log_group_home_dir); if (strchr(srv_log_group_home_dir, ';')) { sql_print_error("syntax error in innodb_log_group_home_dir"); DBUG_RETURN(innobase_init_abort()); } if (!innobase_large_prefix) { ib::warn() << deprecated_large_prefix; } if (!THDVAR(NULL, support_xa)) { ib::warn() << deprecated_innodb_support_xa_off; THDVAR(NULL, support_xa) = TRUE; } if (innobase_file_format_name != innodb_file_format_default) { ib::warn() << deprecated_file_format; } /* Validate the file format by animal name 校驗 innodb_file_format_max; innodb文件格式 */ if (innobase_file_format_name != NULL) { format_id = innobase_file_format_name_lookup( innobase_file_format_name); if (format_id > UNIV_FORMAT_MAX) { sql_print_error("InnoDB: wrong innodb_file_format."); DBUG_RETURN(innobase_init_abort()); } } else { /* Set it to the default file format id. Though this should never happen. */ format_id = 0; } srv_file_format = format_id; /* Given the type of innobase_file_format_name we have little choice but to cast away the constness from the returned name. innobase_file_format_name is used in the MySQL set variable interface and so can't be const. */ innobase_file_format_name = (char*) trx_sys_file_format_id_to_name(format_id); /* Check innobase_file_format_check variable 檢查 innodb_file_format_check 變數; */ if (!innobase_file_format_check) { ib::warn() << deprecated_file_format_check; /* Set the value to disable checking. */ srv_max_file_format_at_startup = UNIV_FORMAT_MAX + 1; } else { /* Set the value to the lowest supported format. */ srv_max_file_format_at_startup = UNIV_FORMAT_MIN; } if (innobase_file_format_max != innodb_file_format_max_default) { ib::warn() << deprecated_file_format_max; } /* Did the user specify a format name that we support? As a side effect it will update the variable srv_max_file_format_at_startup */ if (innobase_file_format_validate_and_set( innobase_file_format_max) < 0) { sql_print_error("InnoDB: invalid" " innodb_file_format_max value:" " should be any value up to %s or its" " equivalent numeric id", trx_sys_file_format_id_to_name( UNIV_FORMAT_MAX)); DBUG_RETURN(innobase_init_abort()); } /** Innodb change buffer */ if (innobase_change_buffering) { ulint use; for (use = 0; use < UT_ARR_SIZE(innobase_change_buffering_values); use++) { if (!innobase_strcasecmp( innobase_change_buffering, innobase_change_buffering_values[use])) { ibuf_use = (ibuf_use_t) use; goto innobase_change_buffering_inited_ok; } } sql_print_error("InnoDB: invalid value" " innodb_change_buffering=%s", innobase_change_buffering); DBUG_RETURN(innobase_init_abort()); } innobase_change_buffering_inited_ok: // Innodb_change_buffering = ALL ut_a((ulint) ibuf_use < UT_ARR_SIZE(innobase_change_buffering_values)); innobase_change_buffering = (char*) innobase_change_buffering_values[ibuf_use]; /* Check that interdependent parameters have sane values. 對相互依賴的參數進行檢查。 srv_max_buf_pool_modified_pct & srv_max_dirty_pages_pct_lwm srv_max_io_capacity & srv_io_capacity & SRV_MAX_IO_CAPACITY_DUMMY_DEFAULT */ if (srv_max_buf_pool_modified_pct < srv_max_dirty_pages_pct_lwm) { sql_print_warning("InnoDB: innodb_max_dirty_pages_pct_lwm" " cannot be set higher than" " innodb_max_dirty_pages_pct.\n" "InnoDB: Setting" " innodb_max_dirty_pages_pct_lwm to %lf\n", srv_max_buf_pool_modified_pct); srv_max_dirty_pages_pct_lwm = srv_max_buf_pool_modified_pct; } if (srv_max_io_capacity == SRV_MAX_IO_CAPACITY_DUMMY_DEFAULT) { if (srv_io_capacity >= SRV_MAX_IO_CAPACITY_LIMIT / 2) { /* Avoid overflow. */ srv_max_io_capacity = SRV_MAX_IO_CAPACITY_LIMIT; } else { /* The user has not set the value. We should set it based on innodb_io_capacity. */ srv_max_io_capacity = ut_max(2 * srv_io_capacity, 2000UL); } } else if (srv_max_io_capacity < srv_io_capacity) { sql_print_warning("InnoDB: innodb_io_capacity" " cannot be set higher than" " innodb_io_capacity_max.\n" "InnoDB: Setting" " innodb_io_capacity to %lu\n", srv_max_io_capacity); srv_io_capacity = srv_max_io_capacity; } // 檢查 innodb_buffer_pool_filename 配置 if (!is_filename_allowed(srv_buf_dump_filename, strlen(srv_buf_dump_filename), FALSE)) { sql_print_error("InnoDB: innodb_buffer_pool_filename" " cannot have colon (:) in the file name."); DBUG_RETURN(innobase_init_abort()); } /* -------------------------------------------------- innodb_file_flush_method & innobase_log_file_size & innodb_log_write_ahead_size innodb_log_buffer_size & innodb_buffer_pool_size & innodb_read_io_threads & innodb_write_io_threads innodb_doublewrite & innodb_log_checksums & innodb_rollback_on_timeout & innobase_locks_unsafe_for_binlog innodb_open_files & innodb_monitor 配置 & innodb_old_blocks_pct & innodb_undo_logs & */ srv_file_flush_method_str = innobase_file_flush_method; srv_log_file_size = (ib_uint64_t) innobase_log_file_size; if (UNIV_PAGE_SIZE_DEF != srv_page_size) { ib::warn() << "innodb-page-size has been changed from the" " default value " << UNIV_PAGE_SIZE_DEF << " to " << srv_page_size << "."; } if (srv_log_write_ahead_size > srv_page_size) { srv_log_write_ahead_size = srv_page_size; } else { ulong srv_log_write_ahead_size_tmp = OS_FILE_LOG_BLOCK_SIZE; while (srv_log_write_ahead_size_tmp < srv_log_write_ahead_size) { srv_log_write_ahead_size_tmp = srv_log_write_ahead_size_tmp * 2; } if (srv_log_write_ahead_size_tmp != srv_log_write_ahead_size) { srv_log_write_ahead_size = srv_log_write_ahead_size_tmp / 2; } } srv_log_buffer_size = (ulint) innobase_log_buffer_size; srv_buf_pool_size = (ulint) innobase_buffer_pool_size; srv_n_read_io_threads = (ulint) innobase_read_io_threads; srv_n_write_io_threads = (ulint) innobase_write_io_threads; srv_use_doublewrite_buf = (ibool) innobase_use_doublewrite; if (!innobase_use_checksums) { ib::warn() << "Setting innodb_checksums to OFF is DEPRECATED." " This option may be removed in future releases. You" " should set innodb_checksum_algorithm=NONE instead."; srv_checksum_algorithm = SRV_CHECKSUM_ALGORITHM_NONE; } innodb_log_checksums_func_update(innodb_log_checksums); #ifdef HAVE_LINUX_LARGE_PAGES if ((os_use_large_pages = my_use_large_pages)) { os_large_page_size = opt_large_page_size; } #endif row_rollback_on_timeout = (ibool) innobase_rollback_on_timeout; srv_locks_unsafe_for_binlog = (ibool) innobase_locks_unsafe_for_binlog; if (innobase_locks_unsafe_for_binlog) { ib::warn() << "Using innodb_locks_unsafe_for_binlog is" " DEPRECATED. This option may be removed in future" " releases. Please use READ COMMITTED transaction" " isolation level instead; " << SET_TRANSACTION_MSG; } if (innobase_open_files < 10) { innobase_open_files = 300; if (srv_file_per_table && table_cache_size > 300) { innobase_open_files = table_cache_size; } } if (innobase_open_files > (long) open_files_limit) { ib::warn() << "innodb_open_files should not be greater" " than the open_files_limit.\n"; if (innobase_open_files > (long) table_cache_size) { innobase_open_files = table_cache_size; } } srv_max_n_open_files = (ulint) innobase_open_files; srv_innodb_status = (ibool) innobase_create_status_file; srv_print_verbose_log = mysqld_embedded ? 0 : 1; /* Round up fts_sort_pll_degree to nearest power of 2 number */ for (num_pll_degree = 1; num_pll_degree < fts_sort_pll_degree; num_pll_degree <<= 1) { /* No op */ } fts_sort_pll_degree = num_pll_degree; /* Store the default charset-collation number of this MySQL installation MySQL默認的 charset-collation. */ data_mysql_default_charset_coll = (ulint) default_charset_info->number; // 初始化 innodb_commit_concurrency[限制並發提交] 的默認值 innobase_commit_concurrency_init_default(); // 初始化 os_event 對象。 os_event_global_init(); /* Set buffer pool size to default for fast startup when mysqld is run with --help --verbose options. */ ulint srv_buf_pool_size_org = 0; if (opt_help && opt_verbose && srv_buf_pool_size > srv_buf_pool_def_size) { ib::warn() << "Setting innodb_buf_pool_size to " << srv_buf_pool_def_size << " for fast startup, " << "when running with --help --verbose options."; srv_buf_pool_size_org = srv_buf_pool_size; srv_buf_pool_size = srv_buf_pool_def_size; } /* Since we in this module access directly the fields of a trx struct, and due to different headers and flags it might happen that ib_mutex_t has a different size in this module and in InnoDB modules, we check at run time that the size is the same in these compilation modules. */ // 啟動或直接創建 innobase err = innobase_start_or_create_for_mysql(); // innobase_buffer_pool_size if (srv_buf_pool_size_org != 0) { /* Set the original value back to show in help. */ srv_buf_pool_size_org = buf_pool_size_align(srv_buf_pool_size_org); innobase_buffer_pool_size = static_cast<long long>(srv_buf_pool_size_org); } else { innobase_buffer_pool_size = static_cast<long long>(srv_buf_pool_size); } if (err != DB_SUCCESS) { DBUG_RETURN(innobase_init_abort()); } /* Create mutex to protect encryption master_key_id. */ mutex_create(LATCH_ID_MASTER_KEY_ID_MUTEX, &master_key_id_mutex); /* Adjust the innodb_undo_logs config object 調整 innodb_undo_logs */ innobase_undo_logs_init_default_max(); innobase_old_blocks_pct = static_cast<uint>( buf_LRU_old_ratio_update(innobase_old_blocks_pct, TRUE)); ibuf_max_size_update(srv_change_buffer_max_size); innobase_open_tables = hash_create(200); mysql_mutex_init(innobase_share_mutex_key.m_value, &innobase_share_mutex, MY_MUTEX_INIT_FAST); mysql_mutex_init(commit_cond_mutex_key.m_value, &commit_cond_m, MY_MUTEX_INIT_FAST); mysql_cond_init(commit_cond_key.m_value, &commit_cond); innodb_inited= 1; #ifdef MYSQL_DYNAMIC_PLUGIN if (innobase_hton != p) { innobase_hton = reinterpret_cast<handlerton*>(p); *innobase_hton = *innodb_hton_ptr; } #endif /* MYSQL_DYNAMIC_PLUGIN */ /* Get the current high water mark format. */ innobase_file_format_max = (char*) trx_sys_file_format_max_get(); /* Currently, monitor counter information are not persistent. Innodb monitor */ memset(monitor_set_tbl, 0, sizeof monitor_set_tbl); memset(innodb_counter_value, 0, sizeof innodb_counter_value); /* Do this as late as possible so server is fully starts up, since we might get some initial stats if user choose to turn on some counters from start up */ if (innobase_enable_monitor_counter) { innodb_enable_monitor_at_startup( innobase_enable_monitor_counter); } /* Turn on monitor counters that are default on */ srv_mon_default_on(); /* Unit Tests */ #ifdef UNIV_ENABLE_UNIT_TEST_GET_PARENT_DIR unit_test_os_file_get_parent_dir(); #endif /* UNIV_ENABLE_UNIT_TEST_GET_PARENT_DIR */ #ifdef UNIV_ENABLE_UNIT_TEST_MAKE_FILEPATH test_make_filepath(); #endif /*UNIV_ENABLE_UNIT_TEST_MAKE_FILEPATH */ #ifdef UNIV_ENABLE_DICT_STATS_TEST test_dict_stats_all(); #endif /*UNIV_ENABLE_DICT_STATS_TEST */ #ifdef UNIV_ENABLE_UNIT_TEST_ROW_RAW_FORMAT_INT # ifdef HAVE_UT_CHRONO_T test_row_raw_format_int(); # endif /* HAVE_UT_CHRONO_T */ #endif /* UNIV_ENABLE_UNIT_TEST_ROW_RAW_FORMAT_INT */ #ifndef UNIV_HOTBACKUP #ifdef _WIN32 if (ut_win_init_time()) { DBUG_RETURN(innobase_init_abort()); } #endif /* _WIN32 */ #endif /* !UNIV_HOTBACKUP */ DBUG_RETURN(0); }
innobase_start_or_create_for_mysql() 函數解析如下:
dberr_t innobase_start_or_create_for_mysql(void) /*====================================*/ { bool create_new_db = false; lsn_t flushed_lsn; ulint sum_of_data_file_sizes; ulint tablespace_size_in_header; dberr_t err; ulint srv_n_log_files_found = srv_n_log_files; mtr_t mtr; purge_pq_t* purge_queue; char logfilename[10000]; char* logfile0 = NULL; size_t dirnamelen; unsigned i = 0; /* Reset the start state. 重置 start state. */ srv_start_state = SRV_START_STATE_NONE; // SRV_FORCE_NO_LOG_REDO: 不做 redo log 的前滾操作 if (srv_force_recovery == SRV_FORCE_NO_LOG_REDO) { srv_read_only_mode = true; } // high_level_read_only: high_level_read_only = srv_read_only_mode || srv_force_recovery > SRV_FORCE_NO_TRX_UNDO; // 如果處於 read_only mode, 那麼除了內部表之外,沒有其他寫操作,關閉兩次寫機制。 if (srv_read_only_mode) { ib::info() << "Started in read only mode"; /* There is no write except to intrinsic table and so turn-off doublewrite mechanism completely. */ srv_use_doublewrite_buf = FALSE; } #ifdef _WIN32 srv_use_native_aio = TRUE; #elif defined(LINUX_NATIVE_AIO) if (srv_use_native_aio) { ib::info() << "Using Linux native AIO"; } #else /* Currently native AIO is supported only on windows and linux and that also when the support is compiled in. In all other cases, we ignore the setting of innodb_use_native_aio. */ srv_use_native_aio = FALSE; #endif /* _WIN32 */ /* Register performance schema stages before any real work has been started which may need to be instrumented. */ mysql_stage_register("innodb", srv_stages, UT_ARR_SIZE(srv_stages)); /** 處理參數 innodb_flush_method 通常情況下,innodb_flush_method 設置為 O_DIRECT | O_DIRECT_NO_FSYNC; */ if (srv_file_flush_method_str == NULL) { /* These are the default options */ #ifndef _WIN32 srv_unix_file_flush_method = SRV_UNIX_FSYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "fsync")) { srv_unix_file_flush_method = SRV_UNIX_FSYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "O_DSYNC")) { srv_unix_file_flush_method = SRV_UNIX_O_DSYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "O_DIRECT")) { srv_unix_file_flush_method = SRV_UNIX_O_DIRECT; } else if (0 == ut_strcmp(srv_file_flush_method_str, "O_DIRECT_NO_FSYNC")) { srv_unix_file_flush_method = SRV_UNIX_O_DIRECT_NO_FSYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "littlesync")) { srv_unix_file_flush_method = SRV_UNIX_LITTLESYNC; } else if (0 == ut_strcmp(srv_file_flush_method_str, "nosync")) { srv_unix_file_flush_method = SRV_UNIX_NOSYNC; #else srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED; } else if (0 == ut_strcmp(srv_file_flush_method_str, "normal")) { srv_win_file_flush_method = SRV_WIN_IO_NORMAL; srv_use_native_aio = FALSE; } else if (0 == ut_strcmp(srv_file_flush_method_str, "unbuffered")) { srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED; srv_use_native_aio = FALSE; } else if (0 == ut_strcmp(srv_file_flush_method_str, "async_unbuffered")) { srv_win_file_flush_method = SRV_WIN_IO_UNBUFFERED; #endif /* _WIN32 */ } else { ib::error() << "Unrecognized value " << srv_file_flush_method_str << " for innodb_flush_method"; return(srv_init_abort(DB_ERROR)); } /* Note that the call srv_boot() also changes the values of some variables to the units used by InnoDB internally */ /* Set the maximum number of threads which can wait for a semaphore inside InnoDB: this is the 'sync wait array' size, as well as the maximum number of threads that can wait in the 'srv_conc array' for their time to enter InnoDB. 設置 Innodb 內部可能等待訊號量的最大執行緒數量: 這是 sync wait array 的大小, 以及 在 srv_conc 數組中等待進入 Innodb的最大執行緒數。 */ srv_max_n_threads = 1 /* io_ibuf_thread */ + 1 /* io_log_thread */ + 1 /* lock_wait_timeout_thread */ + 1 /* srv_error_monitor_thread */ + 1 /* srv_monitor_thread */ + 1 /* srv_master_thread */ + 1 /* srv_purge_coordinator_thread */ + 1 /* buf_dump_thread */ + 1 /* dict_stats_thread */ + 1 /* fts_optimize_thread */ + 1 /* recv_writer_thread */ + 1 /* trx_rollback_or_clean_all_recovered */ + 128 /* added as margin, for use of InnoDB Memcached etc. */ + max_connections + srv_n_read_io_threads + srv_n_write_io_threads + srv_n_purge_threads + srv_n_page_cleaners /* FTS Parallel Sort */ + fts_sort_pll_degree * FTS_NUM_AUX_INDEX * max_connections; /** 重置 innodb_buffer_pool_instances */ if (srv_buf_pool_size >= BUF_POOL_SIZE_THRESHOLD) { if (srv_buf_pool_instances == srv_buf_pool_instances_default) { #if defined(_WIN32) && !defined(_WIN64) /* Do not allocate too large of a buffer pool on Windows 32-bit systems, which can have trouble allocating larger single contiguous memory blocks. */ srv_buf_pool_instances = ut_min( static_cast<ulong>(MAX_BUFFER_POOLS), static_cast<ulong>(srv_buf_pool_size / (128 * 1024 * 1024))); #else /* defined(_WIN32) && !defined(_WIN64) */ /* Default to 8 instances when size > 1GB. */ srv_buf_pool_instances = 8; #endif /* defined(_WIN32) && !defined(_WIN64) */ } } else { /* If buffer pool is less than 1 GiB, assume fewer threads. Also use only one buffer pool instance. */ if (srv_buf_pool_instances != srv_buf_pool_instances_default && srv_buf_pool_instances != 1) { /* We can't distinguish whether the user has explicitly started mysqld with --innodb-buffer-pool-instances=0, (srv_buf_pool_instances_default is 0) or has not specified that option at all. Thus we have the limitation that if the user started with =0, we will not emit a warning here, but we should actually do so. */ ib::info() << "Adjusting innodb_buffer_pool_instances" " from " << srv_buf_pool_instances << " to 1" " since innodb_buffer_pool_size is less than " << BUF_POOL_SIZE_THRESHOLD / (1024 * 1024) << " MiB"; } srv_buf_pool_instances = 1; } // 調整 srv_buf_pool_chunk_unit 大小。 if (srv_buf_pool_chunk_unit * srv_buf_pool_instances > srv_buf_pool_size) { /* Size unit of buffer pool is larger than srv_buf_pool_size. adjust srv_buf_pool_chunk_unit for srv_buf_pool_size. */ srv_buf_pool_chunk_unit = static_cast<ulong>(srv_buf_pool_size) / srv_buf_pool_instances; if (srv_buf_pool_size % srv_buf_pool_instances != 0) { ++srv_buf_pool_chunk_unit; } } // 基於 srv_buf_pool_chunk_unit 對齊 srv_buf_pool_size srv_buf_pool_size = buf_pool_size_align(srv_buf_pool_size); // 根據 srv_buf_pool_instances 重置 innodb_page_cleaners if (srv_n_page_cleaners > srv_buf_pool_instances) { /* limit of page_cleaner parallelizability is number of buffer pool instances. */ srv_n_page_cleaners = srv_buf_pool_instances; } /** 啟動innodb server, 進行相關參數和組件的初始化。 */ srv_boot(); ib::info() << (ut_crc32_sse2_enabled ? "Using" : "Not using") << " CPU crc32 instructions"; // innodb monitor 相關 if (!srv_read_only_mode) { mutex_create(LATCH_ID_SRV_MONITOR_FILE, &srv_monitor_file_mutex); if (srv_innodb_status) { srv_monitor_file_name = static_cast<char*>( ut_malloc_nokey( strlen(fil_path_to_mysql_datadir) + 20 + sizeof "/innodb_status.")); sprintf(srv_monitor_file_name, "%s/innodb_status." ULINTPF, fil_path_to_mysql_datadir, os_proc_get_number()); srv_monitor_file = fopen(srv_monitor_file_name, "w+"); if (!srv_monitor_file) { ib::error() << "Unable to create " << srv_monitor_file_name << ": " << strerror(errno); return(srv_init_abort(DB_ERROR)); } } else { srv_monitor_file_name = NULL; srv_monitor_file = os_file_create_tmpfile(NULL); if (!srv_monitor_file) { return(srv_init_abort(DB_ERROR)); } } mutex_create(LATCH_ID_SRV_DICT_TMPFILE, &srv_dict_tmpfile_mutex); srv_dict_tmpfile = os_file_create_tmpfile(NULL); if (!srv_dict_tmpfile) { return(srv_init_abort(DB_ERROR)); } mutex_create(LATCH_ID_SRV_MISC_TMPFILE, &srv_misc_tmpfile_mutex); srv_misc_tmpfile = os_file_create_tmpfile(NULL); if (!srv_misc_tmpfile) { return(srv_init_abort(DB_ERROR)); } } /** file_io_threads */ // innodb_read_io_threads & innodb_write_io_threads srv_n_file_io_threads = srv_n_read_io_threads; srv_n_file_io_threads += srv_n_write_io_threads; // 非 read only, 添加 log & ibuf io thread if (!srv_read_only_mode) { /* Add the log and ibuf IO threads. */ srv_n_file_io_threads += 2; } else { ib::info() << "Disabling background log and ibuf IO write" << " threads."; } ut_a(srv_n_file_io_threads <= SRV_MAX_N_IO_THREADS); // 初始化非同步IO子系統。 if (!os_aio_init(srv_n_read_io_threads, srv_n_write_io_threads, SRV_MAX_N_PENDING_SYNC_IOS)) { ib::error() << "Cannot initialize AIO sub-system"; return(srv_init_abort(DB_ERROR)); } // 初始化各表空間的記憶體cache fil_init(srv_file_per_table ? 50000 : 5000, srv_max_n_open_files); double size; char unit; // innodb_buffer_pool_size 和 chunk_size if (srv_buf_pool_size >= 1024 * 1024 * 1024) { size = ((double) srv_buf_pool_size) / (1024 * 1024 * 1024); unit = 'G'; } else { size = ((double) srv_buf_pool_size) / (1024 * 1024); unit = 'M'; } double chunk_size; char chunk_unit; if (srv_buf_pool_chunk_unit >= 1024 * 1024 * 1024) { chunk_size = srv_buf_pool_chunk_unit / 1024.0 / 1024 / 1024; chunk_unit = 'G'; } else { chunk_size = srv_buf_pool_chunk_unit / 1024.0 / 1024; chunk_unit = 'M'; } ib::info() << "Initializing buffer pool, total size = " << size << unit << ", instances = " << srv_buf_pool_instances << ", chunk size = " << chunk_size << chunk_unit; // 創建 innodb_buffer_pool, 當沒有足夠的記憶體時會報錯 err = buf_pool_init(srv_buf_pool_size, srv_buf_pool_instances); if (err != DB_SUCCESS) { ib::error() << "Cannot allocate memory for the buffer pool"; return(srv_init_abort(DB_ERROR)); } ib::info() << "Completed initialization of buffer pool"; // 初始化 fsp 系統 & redo log fsp_init(); log_init(); // 創建 recovery 系統, 針對一個 recovery 操作初始化 recovery 系統 recv_sys_create(); recv_sys_init(buf_pool_get_curr_size()); // 資料庫啟動時創建鎖系統 lock_sys_create(srv_lock_table_size); // start lock-timeout thread srv_start_state_set(SRV_START_STATE_LOCK_SYS); /* Create i/o-handler threads: 創建 io 執行緒 */ for (ulint t = 0; t < srv_n_file_io_threads; ++t) { n[t] = t; os_thread_create(io_handler_thread, n + t, thread_ids + t); } /* Even in read-only mode there could be flush job generated by intrinsic table operations. 初始化 page_cleaner */ buf_flush_page_cleaner_init(); // 創建 buf_flush_page_cleaner_coordinator 執行緒 os_thread_create(buf_flush_page_cleaner_coordinator, NULL, NULL); // 創建 buf_flush_page_cleaner_worker 執行緒 for (i = 1; i < srv_n_page_cleaners; ++i) { os_thread_create(buf_flush_page_cleaner_worker, NULL, NULL); } /* Make sure page cleaner is active. page_cleaner處於活躍狀態 */ while (!buf_page_cleaner_is_active) { os_thread_sleep(10000); } // start io-thread srv_start_state_set(SRV_START_STATE_IO); // 對目錄進行規範 os_normalize_path(srv_data_home); /* Check if the data files exist or not. 檢查數據文件是否存在, ibdata1 ibdata2 等等,判斷是否需要創建新的資料庫 */ err = srv_sys_space.check_file_spec( &create_new_db, MIN_EXPECTED_TABLESPACE_SIZE); if (err != DB_SUCCESS) { return(srv_init_abort(DB_ERROR)); } // 不是創建新的db, 則需要回滾未完成的事務 srv_startup_is_before_trx_rollback_phase = !create_new_db; /* Check if undo tablespaces and redo log files exist before creating a new system tablespace 檢查是否存在 redo log file 和 undo 表空間 */ if (create_new_db) { err = srv_check_undo_redo_logs_exists(); if (err != DB_SUCCESS) { return(srv_init_abort(DB_ERROR)); } recv_sys_debug_free(); } /* Open or create the data files. 打開或者創建數據文件。 */ ulint sum_of_new_sizes; // 打開或者創建數據文件[ibdata文件],並從 ibdata1 文件中讀取 flushed_lsn err = srv_sys_space.open_or_create( false, create_new_db, &sum_of_new_sizes, &flushed_lsn); switch (err) { case DB_SUCCESS: break; case DB_CANNOT_OPEN_FILE: ib::error() << "Could not open or create the system tablespace. If" " you tried to add new data files to the system" " tablespace, and it failed here, you should now" " edit innodb_data_file_path in my.cnf back to what" " it was, and remove the new ibdata files InnoDB" " created in this failed attempt. InnoDB only wrote" " those files full of zeros, but did not yet use" " them in any way. But be careful: do not remove" " old data files which contain your precious data!"; /* fall through */ default: /* Other errors might come from Datafile::validate_first_page() */ return(srv_init_abort(err)); } dirnamelen = strlen(srv_log_group_home_dir); ut_a(dirnamelen < (sizeof logfilename) - 10 - sizeof "ib_logfile"); memcpy(logfilename, srv_log_group_home_dir, dirnamelen); /* Add a path separator if needed. */ if (dirnamelen && logfilename[dirnamelen - 1] != OS_PATH_SEPARATOR) { logfilename[dirnamelen++] = OS_PATH_SEPARATOR; } srv_log_file_size_requested = srv_log_file_size; if (create_new_db) { /** 如果是 create new db */ // 從所有緩衝池實例的 flush list 的末尾同步的 flush dirty blocks. buf_flush_sync_all_buf_pools(); // 獲取 current lsn flushed_lsn = log_get_lsn(); // 創建 redo log file err = create_log_files( logfilename, dirnamelen, flushed_lsn, logfile0); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } } else { // not create new db for (i = 0; i < SRV_N_LOG_FILES_MAX; i++) { os_offset_t size; os_file_stat_t stat_info; sprintf(logfilename + dirnamelen, "ib_logfile%u", i); // 獲取 logfile 文件狀態 err = os_file_get_status( logfilename, &stat_info, false, srv_read_only_mode); if (err == DB_NOT_FOUND) { if (i == 0) { if (flushed_lsn < static_cast<lsn_t>(1000)) { ib::error() << "Cannot create" " log files because" " data files are" " corrupt or the" " database was not" " shut down cleanly" " after creating" " the data files."; return(srv_init_abort( DB_ERROR)); } err = create_log_files( logfilename, dirnamelen, flushed_lsn, logfile0); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } create_log_files_rename( logfilename, dirnamelen, flushed_lsn, logfile0); /* Suppress the message about crash recovery. */ flushed_lsn = log_get_lsn(); goto files_checked; } else if (i < 2) { /* must have at least 2 log files */ ib::error() << "Only one log file" " found."; return(srv_init_abort(err)); } /* opened all files */ break; } // 檢查 log file mode if (!srv_file_check_mode(logfilename)) { return(srv_init_abort(DB_ERROR)); } // 打開 redo log file err = open_log_file(&files[i], logfilename, &size); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } ut_a(size != (os_offset_t) -1); if (size & ((1 << UNIV_PAGE_SIZE_SHIFT) - 1)) { ib::error() << "Log file " << logfilename << " size " << size << " is not a" " multiple of innodb_page_size"; return(srv_init_abort(DB_ERROR)); } size >>= UNIV_PAGE_SIZE_SHIFT; if (i == 0) { srv_log_file_size = size; } else if (size != srv_log_file_size) { ib::error() << "Log file " << logfilename << " is of different size " << (size << UNIV_PAGE_SIZE_SHIFT) << " bytes than other log files " << (srv_log_file_size << UNIV_PAGE_SIZE_SHIFT) << " bytes!"; return(srv_init_abort(DB_ERROR)); } } // logfile的數量 srv_n_log_files_found = i; /* Create the in-memory file space objects. 創建 log file 記憶體中的文件空間對象。 */ sprintf(logfilename + dirnamelen, "ib_logfile%u", 0); /* Disable the doublewrite buffer for log files. log file 禁用兩次寫緩衝區。 */ fil_space_t* log_space = fil_space_create( "innodb_redo_log", SRV_LOG_SPACE_FIRST_ID, fsp_flags_set_page_size(0, univ_page_size), FIL_TYPE_LOG); ut_a(fil_validate()); ut_a(log_space); /* srv_log_file_size is measured in pages; if page size is 16KB, then we have a limit of 64TB on 32 bit systems */ ut_a(srv_log_file_size <= ULINT_MAX); // 添加 log file文件到 log file space 中 for (unsigned j = 0; j < i; j++) { sprintf(logfilename + dirnamelen, "ib_logfile%u", j); if (!fil_node_create(logfilename, (ulint) srv_log_file_size, log_space, false, false)) { return(srv_init_abort(DB_ERROR)); } } // 初始化 redo log group if (!log_group_init(0, i, srv_log_file_size * UNIV_PAGE_SIZE, SRV_LOG_SPACE_FIRST_ID)) { return(srv_init_abort(DB_ERROR)); } } files_checked: /* Open all log files and data files in the system tablespace: we keep them open until database shutdown */ // 打開所有的日誌文件和系統表數據文件。 fil_open_log_and_system_tablespace_files(); // 打開 undo 表空間, 在找到並打開所有的 undo 文件之後, 將他們全部加入文件管理系統 err = srv_undo_tablespaces_init( create_new_db, srv_undo_tablespaces, &srv_undo_tablespaces_open); /* If the force recovery is set very high then we carry on regardless of all errors. Basically this is fingers crossed mode. 接下來涉及到數據的恢復。 */ if (err != DB_SUCCESS && srv_force_recovery < SRV_FORCE_NO_UNDO_LOG_SCAN) { return(srv_init_abort(err)); } /* Initialize objects used by dict stats gathering thread, which can also be used by recovery if it tries to drop some table */ if (!srv_read_only_mode) { dict_stats_thread_init(); } // 初始化 file_format_max變數。 trx_sys_file_format_init(); // 創建 trx_sys instance 並初始化 purge_queue 和 mutex trx_sys_create(); if (create_new_db) { ut_a(!srv_read_only_mode); mtr_start(&mtr); bool ret = fsp_header_init(0, sum_of_new_sizes, &mtr); mtr_commit(&mtr); if (!ret) { return(srv_init_abort(DB_ERROR)); } /* To maintain backward compatibility we create only the first rollback segment before the double write buffer. All the remaining rollback segments will be created later, after the double write buffer has been created. */ trx_sys_create_sys_pages(); purge_queue = trx_sys_init_at_db_start(); DBUG_EXECUTE_IF("check_no_undo", ut_ad(purge_queue->empty()); ); /* The purge system needs to create the purge view and therefore requires that the trx_sys is inited. */ trx_purge_sys_create(srv_n_purge_threads, purge_queue); err = dict_create(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } buf_flush_sync_all_buf_pools(); flushed_lsn = log_get_lsn(); fil_write_flushed_lsn(flushed_lsn); create_log_files_rename( logfilename, dirnamelen, flushed_lsn, logfile0); } else { /* Check if we support the max format that is stamped on the system tablespace. Note: We are NOT allowed to make any modifications to the TRX_SYS_PAGE_NO page before recovery because this page also contains the max_trx_id etc. important system variables that are required for recovery. We need to ensure that we return the system to a state where normal recovery is guaranteed to work. We do this by invalidating the buffer cache, this will force the reread of the page and restoration to its last known consistent state, this is REQUIRED for the recovery process to work. */ // 檢查是否支援系統表空間上的 max 格式。 err = trx_sys_file_format_max_check( srv_max_file_format_at_startup); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } /* Invalidate the buffer pool to ensure that we reread the page that we read above, during recovery. Note that this is not as heavy weight as it seems. At this point there will be only ONE page in the buf_LRU and there must be no page in the buf_flush list. 使整個緩衝池無效, 來確保在 recovery的過程中我們重啟讀取之前讀取的頁。 這是一個很輕量級的操作, 此時再 LRU 列表中只有一個數據頁, 在 flush 列表中沒有任何數據頁。 */ buf_pool_invalidate(); /* Scan and locate truncate log files. Parsed located files and add table to truncate information to central vector for truncate fix-up action post recovery. 掃描並定位 truncate log file, 解析truncate log file. */ err = TruncateLogParser::scan_and_parse(srv_log_group_home_dir); if (err != DB_SUCCESS) { return(srv_init_abort(DB_ERROR)); } /* We always try to do a recovery, even if the database had been shut down normally: this is the normal startup path 通常情況下, 需要做一個 recovery 操作, 即使 database 正常關閉。 */ /** 從 checkpoint flushed_lsn 位置開始恢復。 1. 初始化紅黑樹, 以便在恢復的過程中快速插入 flush 列表。 2. 在 log groups 中查找 latest checkpoint 3. 讀取 latest checkpoint 所在的 redo log 頁到 log_sys->checkpoint_buf中 4. 獲取 checkpoint_lsn 和 checkpoint_no 5. 從 checkpoing_lsn 讀取 redo log 到 hash 表中。 6. 檢查 crash recovery 所需的表空間, 處理並刪除double write buf 中的數據頁, 這裡會檢查double write buf 中頁對應的真實數據頁的 完整性, 如果有問題, 則使用 double write buf 中頁進行恢復。同時, 生成後台執行緒 recv_writer_thread 以清理緩衝池中的臟頁。 7. 將日誌段從最新的日誌組複製到其他組, 我們目前只有一個日誌組。 */ err = recv_recovery_from_checkpoint_start(flushed_lsn); // 清除 double write buf 中的數據頁 recv_sys->dblwr.pages.clear(); // 初始化 數據字典系統,並初始化change buffer if (err == DB_SUCCESS) { /* Initialize the change buffer. */ err = dict_boot(); } if (err != DB_SUCCESS) { /* A tablespace was not found during recovery. The user must force recovery. */ if (err == DB_TABLESPACE_NOT_FOUND) { srv_fatal_error(); ut_error; } return(srv_init_abort(DB_ERROR)); } // 創建並初始化事務系統。 purge_queue = trx_sys_init_at_db_start(); if (srv_force_recovery < SRV_FORCE_NO_LOG_REDO) { /* Apply the hashed log records to the respective file pages, for the last batch of recv_group_scan_log_recs(). */ // 應用 redo log, 完成 crash recovery 操作. recv_apply_hashed_log_recs(TRUE); DBUG_PRINT("ib_log", ("apply completed")); if (recv_needed_recovery) { /// Last MySQL binlog file position 0 894036112, file name mysql-bin.002128 trx_sys_print_mysql_binlog_offset(); } } if (recv_sys->found_corrupt_log) { ib::warn() << "The log file may have been corrupt and it" " is possible that the log scan or parsing" " did not proceed far enough in recovery." " Please run CHECK TABLE on your InnoDB tables" " to check that they are ok!" " It may be safest to recover your" " InnoDB database from a backup!"; } /* The purge system needs to create the purge view and therefore requires that the trx_sys is inited. */ // 創建 trx_purge_sys trx_purge_sys_create(srv_n_purge_threads, purge_queue); /* recv_recovery_from_checkpoint_finish needs trx lists which are initialized in trx_sys_init_at_db_start(). */ /* 完成 recovery 操作。 1. 確保 recv_writer 執行緒已完成 2. 等待 flush 操作完成, flush臟頁操作已經完成 3. 等待 recv_writer 執行緒終止 4. 釋放 flush 紅黑樹 5. 回滾所有的數據字典表的事務,以便數據字典表沒有被鎖定。數據字典 latch 應保證一次只有一個數據字典事務處於活躍狀態。 */ recv_recovery_from_checkpoint_finish(); /* Fix-up truncate of tables in the system tablespace if server crashed while truncate was active. The non- system tables are done after tablespace discovery. Do this now because this procedure assumes that no pages have changed since redo recovery. Tablespace discovery can do updates to pages in the system tablespace.*/ // 修復系統表空間中的表 err = truncate_t::fixup_tables_in_system_tablespace(); if (srv_force_recovery < SRV_FORCE_NO_IBUF_MERGE) { /* Open or Create SYS_TABLESPACES and SYS_DATAFILES so that tablespace names and other metadata can be found. */ srv_sys_tablespaces_open = true; // 檢查數據字典中每個表的表空間 err = dict_create_or_check_sys_tablespace(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } /* The following call is necessary for the insert buffer to work with multiple tablespaces. We must know the mapping between space id's and .ibd file names. In a crash recovery, we check that the info in data dictionary is consistent with what we already know about space id's from the calls to fil_ibd_load(). In a normal startup, we create the space objects for every table in the InnoDB data dictionary that has an .ibd file. We also determine the maximum tablespace id used. The 'validate' flag indicates that when a tablespace is opened, we also read the header page and validate the contents to the data dictionary. This is time consuming, especially for databases with lots of ibd files. So only do it after a crash and not forcing recovery. Open rw transactions at this point is not a good reason to validate. */ bool validate = recv_needed_recovery && srv_force_recovery == 0; dict_check_tablespaces_and_store_max_id(validate); } /* Rotate the encryption key for recovery. It's because server could crash in middle of key rotation. Some tablespace didn't complete key rotation. Here, we will resume the rotation. */ if (!srv_read_only_mode && srv_force_recovery < SRV_FORCE_NO_LOG_REDO) { fil_encryption_rotate(); } /* Fix-up truncate of table if server crashed while truncate was active. */ err = truncate_t::fixup_tables_in_non_system_tablespace(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } if (!srv_force_recovery && !recv_sys->found_corrupt_log && (srv_log_file_size_requested != srv_log_file_size || srv_n_log_files_found != srv_n_log_files)) { /* Prepare to replace the redo log files. */ if (srv_read_only_mode) { ib::error() << "Cannot resize log files" " in read-only mode."; return(srv_init_abort(DB_READ_ONLY)); } /* Prepare to delete the old redo log files */ flushed_lsn = srv_prepare_to_delete_redo_log_files(i); /* Prohibit redo log writes from any other threads until creating a log checkpoint at the end of create_log_files(). */ ut_d(recv_no_log_write = true); ut_ad(!buf_pool_check_no_pending_io()); RECOVERY_CRASH(3); /* Stamp the LSN to the data files. */ fil_write_flushed_lsn(flushed_lsn); RECOVERY_CRASH(4); /* Close and free the redo log files, so that we can replace them. */ fil_close_log_files(true); RECOVERY_CRASH(5); /* Free the old log file space. */ log_group_close_all(); ib::warn() << "Starting to delete and rewrite log" " files."; srv_log_file_size = srv_log_file_size_requested; err = create_log_files( logfilename, dirnamelen, flushed_lsn, logfile0); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } create_log_files_rename( logfilename, dirnamelen, flushed_lsn, logfile0); } // 回滾未提交的不完整的事務, 這是在一個後台執行緒中進行的。 recv_recovery_rollback_active(); /* It is possible that file_format tag has never been set. In this case we initialize it to minimum value. Important to note that we can do it ONLY after we have finished the recovery process so that the image of TRX_SYS_PAGE_NO is not stale. */ trx_sys_file_format_tag_init(); } if (!create_new_db) { /* Check and reset any no-redo rseg slot on disk used by pre-5.7.2 redo resg with no data to purge. */ trx_rseg_reset_pending(); } if (!create_new_db && sum_of_new_sizes > 0) { /* New data file(s) were added */ mtr_start(&mtr); fsp_header_inc_size(0, sum_of_new_sizes, &mtr); mtr_commit(&mtr); /* Immediately write the log record about increased tablespace size to disk, so that it is durable even if mysqld would crash quickly */ log_buffer_flush_to_disk(); } /* Open temp-tablespace and keep it open until shutdown. */ // 打開臨時表空間 err = srv_open_tmp_tablespace(create_new_db, &srv_tmp_space); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } /* Create the doublewrite buffer to a new tablespace */ if (buf_dblwr == NULL && !buf_dblwr_create()) { return(srv_init_abort(DB_ERROR)); } /* Here the double write buffer has already been created and so any new rollback segments will be allocated after the double write buffer. The default segment should already exist. We create the new segments only if it's a new database or the database was shutdown cleanly. */ /* Note: When creating the extra rollback segments during an upgrade we violate the latching order, even if the change buffer is empty. We make an exception in sync0sync.cc and check srv_is_being_started for that violation. It cannot create a deadlock because we are still running in single threaded mode essentially. Only the IO threads should be running at this stage. */ /* Deprecate innodb_undo_logs. But still use it if it is set to non-default and innodb_rollback_segments is default. */ ut_a(srv_rollback_segments > 0); ut_a(srv_rollback_segments <= TRX_SYS_N_RSEGS); ut_a(srv_undo_logs > 0); ut_a(srv_undo_logs <= TRX_SYS_N_RSEGS); if (srv_undo_logs < TRX_SYS_N_RSEGS) { ib::warn() << deprecated_undo_logs; if (srv_rollback_segments == TRX_SYS_N_RSEGS) { srv_rollback_segments = srv_undo_logs; } } /* The number of rsegs that exist in InnoDB is given by status variable srv_available_undo_logs. The number of rsegs to use can be set using the dynamic global variable srv_rollback_segments. */ // 創建回滾段 srv_available_undo_logs = trx_sys_create_rsegs( srv_undo_tablespaces, srv_rollback_segments, srv_tmp_undo_logs); if (srv_available_undo_logs == ULINT_UNDEFINED) { /* Can only happen if server is read only. */ ut_a(srv_read_only_mode); srv_rollback_segments = ULONG_UNDEFINED; } else if (srv_available_undo_logs < srv_rollback_segments && !srv_force_recovery && !recv_needed_recovery) { ib::error() << "System or UNDO tablespace is running of out" << " of space"; /* Should due to out of file space. */ return(srv_init_abort(DB_ERROR)); } srv_startup_is_before_trx_rollback_phase = false; if (!srv_read_only_mode) { /* Create the thread which watches the timeouts for lock waits 創建 lock_wait_timeout_thread watch 執行緒 */ os_thread_create( lock_wait_timeout_thread, NULL, thread_ids + 2 + SRV_MAX_N_IO_THREADS); /* Create the thread which warns of long semaphore waits 創建 srv_error_monitor_thread 執行緒 */ os_thread_create( srv_error_monitor_thread, NULL, thread_ids + 3 + SRV_MAX_N_IO_THREADS); /* Create the thread which prints InnoDB monitor info 創建 Innodb monitor info print 執行緒 */ os_thread_create( srv_monitor_thread, NULL, thread_ids + 4 + SRV_MAX_N_IO_THREADS); srv_start_state_set(SRV_START_STATE_MONITOR); } /* Create the SYS_FOREIGN and SYS_FOREIGN_COLS system tables */ err = dict_create_or_check_foreign_constraint_tables(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } /* Create the SYS_TABLESPACES system table */ err = dict_create_or_check_sys_tablespace(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } srv_sys_tablespaces_open = true; /* Create the SYS_VIRTUAL system table */ err = dict_create_or_check_sys_virtual(); if (err != DB_SUCCESS) { return(srv_init_abort(err)); } srv_is_being_started = false; ut_a(trx_purge_state() == PURGE_STATE_INIT); /* Create the master thread which does purge and other utility operations 創建 master 執行緒 */ if (!srv_read_only_mode) { os_thread_create( srv_master_thread, NULL, thread_ids + (1 + SRV_MAX_N_IO_THREADS)); srv_start_state_set(SRV_START_STATE_MASTER); } // purge_coordinator 執行緒和 purge_worker 執行緒 if (!srv_read_only_mode && srv_force_recovery < SRV_FORCE_NO_BACKGROUND) { os_thread_create( srv_purge_coordinator_thread, NULL, thread_ids + 5 + SRV_MAX_N_IO_THREADS); ut_a(UT_ARR_SIZE(thread_ids) > 5 + srv_n_purge_threads + SRV_MAX_N_IO_THREADS); /* We've already created the purge coordinator thread above. */ for (i = 1; i < srv_n_purge_threads; ++i) { os_thread_create( srv_worker_thread, NULL, thread_ids + 5 + i + SRV_MAX_N_IO_THREADS); } // 等待 purge thread 啟動 srv_start_wait_for_purge_to_start(); srv_start_state_set(SRV_START_STATE_PURGE); } else { purge_sys->state = PURGE_STATE_DISABLED; } /* wake main loop of page cleaner up 喚醒 page cleaner 主循環 */ os_event_set(buf_flush_event); sum_of_data_file_sizes = srv_sys_space.get_sum_of_sizes(); ut_a(sum_of_new_sizes != ULINT_UNDEFINED); tablespace_size_in_header = fsp_header_get_tablespace_size(); if (!srv_read_only_mode && !srv_sys_space.can_auto_extend_last_file() && sum_of_data_file_sizes != tablespace_size_in_header) { ib::error() << "Tablespace size stored in header is " << tablespace_size_in_header << " pages, but the sum" " of data file sizes is " << sum_of_data_file_sizes << " pages"; if (srv_force_recovery == 0 && sum_of_data_file_sizes < tablespace_size_in_header) { /* This is a fatal error, the tail of a tablespace is missing */ ib::error() << "Cannot start InnoDB." " The tail of the system tablespace is" " missing. Have you edited" " innodb_data_file_path in my.cnf in an" " inappropriate way, removing" " ibdata files from there?" " You can set innodb_force_recovery=1" " in my.cnf to force" " a startup if you are trying" " to recover a badly corrupt database."; return(srv_init_abort(DB_ERROR)); } } if (!srv_read_only_mode && srv_sys_space.can_auto_extend_last_file() && sum_of_data_file_sizes < tablespace_size_in_header) { ib::error() << "Tablespace size stored in header is " << tablespace_size_in_header << " pages, but the sum" " of data file sizes is only " << sum_of_data_file_sizes << " pages"; if (srv_force_recovery == 0) { ib::error() << "Cannot start InnoDB. The tail of" " the system tablespace is" " missing. Have you edited" " innodb_data_file_path in my.cnf in an" " InnoDB: inappropriate way, removing" " ibdata files from there?" " You can set innodb_force_recovery=1" " in my.cnf to force" " InnoDB: a startup if you are trying to" " recover a badly corrupt database."; return(srv_init_abort(DB_ERROR)); } } if (srv_print_verbose_log) { ib::info() << INNODB_VERSION_STR << " started; log sequence number " << srv_start_lsn; } if (srv_force_recovery > 0) { ib::info() << "!!! innodb_force_recovery is set to " << srv_force_recovery << " !!!"; } if (srv_force_recovery == 0) { /* In the insert buffer we may have even bigger tablespace id's, because we may have dropped those tablespaces, but insert buffer merge has not had time to clean the records from the ibuf tree. */ ibuf_update_max_tablespace_id(); } if (!srv_read_only_mode) { if (create_new_db) { srv_buffer_pool_load_at_startup = FALSE; } /* Create the buffer pool dump/load thread */ os_thread_create(buf_dump_thread, NULL, NULL); /* Create the dict stats gathering thread */ os_thread_create(dict_stats_thread, NULL, NULL); /* Create the thread that will optimize the FTS sub-system. */ fts_optimize_init(); srv_start_state_set(SRV_START_STATE_STAT); } /* Create the buffer pool resize thread */ os_thread_create(buf_resize_thread, NULL, NULL); srv_was_started = TRUE; return(DB_SUCCESS); }