RocksDB事務的隔離性分析【原創】
Rocksdb事務隔離性指的是多執行緒並發事務使用時候,事務與事務之間的隔離性,通過加鎖機制來實現,本文重點剖析Read Commited隔離級別下,Rocksdb的加鎖機制。
- Rocksdb事務相關類族
Rocksdb的事務相關的類圖如下圖所示。主要有兩個類族,Transaction和DB,默認採用PessimisticTransaction,而PessimisticTransaction內部的加鎖機制通過TransactionLockMgr來實現的。
TransactionLockMgr內部維護了LockMap。TransactionLockMgr根據每個記錄的Key計算hash值,再對num_stripes取模,在LockMap中的向量Std::vector<LockMapStripe>定位LockMapStripe,這樣減少實體鎖的競爭激烈程度,相當於鎖分解。
LockMap的數據成員如下
Size_t num_stripes LockMapStripe個數,默認16個
Std::vector<LockMapStripe> LockMapStripe數組
LockMapStripe的數據成員如下
std::shared_ptr<TransactionDBMutex> stripe_mutex : 實體鎖
std::shared_ptr<TransactionDBCondVar> stripe_cv : 實體條件變數
std::unordered_map<std::string, LockInfo> keys : 具有相同Key hash值的每條記錄的加鎖資訊,std::string為記錄的Key值。
LockInfo的數據成員如下
bool exclusive : 排它鎖,還是共享鎖
uint64_t expiration_time : 鎖的過期時間
autovector<TransactionID> txn_ids : 這把鎖阻塞的事務ID列表
2. Rocksdb事務流程分析
上述流程,是應用創建TransactionDB,然後Put一條記錄,再Commit的協作流程圖,在Put階段調用TransactionLockMgr的TryLock方法,Commit階段調用TransactionLockMgr的UnLock方法。
TransactionLockMgr::TryLock內部的主要邏輯在AcquireLocked函數中,TransactionLockMgr::UnLock內部的主要邏輯在UnlockKey函數中,下面具體分析這兩個函數。綠色部分字體為個人註解。
AcquireLocked
Status TransactionLockMgr::AcquireLocked(LockMap* lock_map,
LockMapStripe* stripe,
const std::string& key, //記錄的Key值
Env* env,
LockInfo&& txn_lock_info, //當前事務鎖資訊
uint64_t* expire_time, //鎖的過期時間
autovector<TransactionID>* txn_ids)
{
Status result;
auto stripe_iter = stripe->keys.find(key); // 檢查這條記錄的Key是否已經被加鎖了。
if (stripe_iter != stripe->keys.end()) { // 這條記錄的Key已經被之前事務加過鎖
LockInfo& lock_info = stripe_iter->second;
if (lock_info.exclusive || txn_lock_info.exclusive) { //之前事務或者當前事務加的是排他鎖,
if (lock_info.txn_ids.size() == 1 &&
lock_info.txn_ids[0] == txn_lock_info.txn_ids[0]) { //之前加鎖的事務就是當前事務
lock_info.exclusive = txn_lock_info.exclusive;
lock_info.expiration_time = txn_lock_info.expiration_time;
} else { //之前加鎖的事務不是當前事務
if (IsLockExpired(txn_lock_info.txn_ids[0], lock_info, env,
expire_time)) { // 之前事務加的鎖已經過期,可以清除
lock_info.txn_ids = txn_lock_info.txn_ids;
lock_info.exclusive = txn_lock_info.exclusive;
lock_info.expiration_time = txn_lock_info.expiration_time;
} else {
result = Status::TimedOut(Status::SubCode::kLockTimeout);
*txn_ids = lock_info.txn_ids; // 返回之前事務列表
}
}
} else { //當前事務加的是共享鎖
lock_info.txn_ids.push_back(txn_lock_info.txn_ids[0]);
lock_info.expiration_time =
std::max(lock_info.expiration_time, txn_lock_info.expiration_time);
}
} else { // 這條記錄的Key沒有被之前事務加過鎖
if (max_num_locks_ > 0 &&
lock_map->lock_cnt.load(std::memory_order_acquire) >= max_num_locks_) {
result = Status::Busy(Status::SubCode::kLockLimit);
} else {
// 當前事務執行加鎖操作
stripe->keys.emplace(key, std::move(txn_lock_info));
if (max_num_locks_) {
lock_map->lock_cnt++;
}
}
}
return result;
}
UnlockKey邏輯相對簡單一些,主要是刪除加鎖的記錄,並且喚醒被阻塞的事務。
void TransactionLockMgr::UnLockKey(const PessimisticTransaction* txn,
const std::string& key,
LockMapStripe* stripe, LockMap* lock_map,
Env* env) {
TransactionID txn_id = txn->GetID();
auto stripe_iter = stripe->keys.find(key);
if (stripe_iter != stripe->keys.end()) {
auto& txns = stripe_iter->second.txn_ids;
auto txn_it = std::find(txns.begin(), txns.end(), txn_id);
// Found the key we locked. unlock it.
if (txn_it != txns.end()) {
if (txns.size() == 1) {
stripe->keys.erase(stripe_iter);
} else {
auto last_it = txns.end() – 1;
if (txn_it != last_it) {
*txn_it = *last_it;
}
txns.pop_back();
}
if (max_num_locks_ > 0) {
// Maintain lock count if there is a limit on the number of locks.
assert(lock_map->lock_cnt.load(std::memory_order_relaxed) > 0);
lock_map->lock_cnt–;
}
}
} else {
// This key is either not locked or locked by someone else. This should
// only happen if the unlocking transaction has expired.
assert(txn->GetExpirationTime() > 0 &&
txn->GetExpirationTime() < env->NowMicros());
}
}