RocksDB事務的隔離性分析【原創】

Rocksdb事務隔離性指的是多執行緒並發事務使用時候,事務與事務之間的隔離性,通過加鎖機制來實現,本文重點剖析Read Commited隔離級別下,Rocksdb的加鎖機制。

  1. Rocksdb事務相關類族

Rocksdb的事務相關的類圖如下圖所示。主要有兩個類族,Transaction和DB,默認採用PessimisticTransaction,而PessimisticTransaction內部的加鎖機制通過TransactionLockMgr來實現的。

 

TransactionLockMgr內部維護了LockMap。TransactionLockMgr根據每個記錄的Key計算hash值,再對num_stripes取模,在LockMap中的向量Std::vector<LockMapStripe>定位LockMapStripe,這樣減少實體鎖的競爭激烈程度,相當於鎖分解。

 

LockMap的數據成員如下

Size_t num_stripes          LockMapStripe個數,默認16個

Std::vector<LockMapStripe>   LockMapStripe數組

 

LockMapStripe的數據成員如下

std::shared_ptr<TransactionDBMutex>  stripe_mutex :   實體鎖

std::shared_ptr<TransactionDBCondVar>  stripe_cv :     實體條件變數

std::unordered_map<std::string, LockInfo>  keys :       具有相同Key hash值的每條記錄的加鎖資訊,std::string為記錄的Key值。

 

LockInfo的數據成員如下

bool exclusive :                     排它鎖,還是共享鎖

uint64_t expiration_time :            鎖的過期時間

autovector<TransactionID>  txn_ids :   這把鎖阻塞的事務ID列表

 

2. Rocksdb事務流程分析

 

 

 

 

上述流程,是應用創建TransactionDB,然後Put一條記錄,再Commit的協作流程圖,在Put階段調用TransactionLockMgr的TryLock方法,Commit階段調用TransactionLockMgr的UnLock方法。

        TransactionLockMgr::TryLock內部的主要邏輯在AcquireLocked函數中,TransactionLockMgr::UnLock內部的主要邏輯在UnlockKey函數中,下面具體分析這兩個函數。綠色部分字體為個人註解。

AcquireLocked

 

 Status TransactionLockMgr::AcquireLocked(LockMap* lock_map,

                                         LockMapStripe* stripe,

                                         const std::string& key,    //記錄的Key值

Env* env,

                                         LockInfo&& txn_lock_info,  //當前事務鎖資訊

                                         uint64_t* expire_time,     //鎖的過期時間

                                         autovector<TransactionID>* txn_ids)

 {

  Status result;

  auto stripe_iter = stripe->keys.find(key);  // 檢查這條記錄的Key是否已經被加鎖了。

  if (stripe_iter != stripe->keys.end()) {       // 這條記錄的Key已經被之前事務加過鎖

    LockInfo& lock_info = stripe_iter->second;

    if (lock_info.exclusive || txn_lock_info.exclusive) {   //之前事務或者當前事務加的是排他鎖,

      if (lock_info.txn_ids.size() == 1 &&

          lock_info.txn_ids[0] == txn_lock_info.txn_ids[0]) {  //之前加鎖的事務就是當前事務

        lock_info.exclusive = txn_lock_info.exclusive;

        lock_info.expiration_time = txn_lock_info.expiration_time;

      } else {       //之前加鎖的事務不是當前事務

        if (IsLockExpired(txn_lock_info.txn_ids[0], lock_info, env,

                          expire_time)) {   // 之前事務加的鎖已經過期,可以清除

          lock_info.txn_ids = txn_lock_info.txn_ids;

          lock_info.exclusive = txn_lock_info.exclusive;

          lock_info.expiration_time = txn_lock_info.expiration_time;

        } else { 

          result = Status::TimedOut(Status::SubCode::kLockTimeout);

          *txn_ids = lock_info.txn_ids;   // 返回之前事務列表

        }

      }

    } else {   //當前事務加的是共享鎖

      lock_info.txn_ids.push_back(txn_lock_info.txn_ids[0]);

      lock_info.expiration_time =

          std::max(lock_info.expiration_time, txn_lock_info.expiration_time);

    }

  } else {  // 這條記錄的Key沒有被之前事務加過鎖

    if (max_num_locks_ > 0 &&

        lock_map->lock_cnt.load(std::memory_order_acquire) >= max_num_locks_) {

      result = Status::Busy(Status::SubCode::kLockLimit);

    } else {

      // 當前事務執行加鎖操作

      stripe->keys.emplace(key, std::move(txn_lock_info));

      if (max_num_locks_) {

        lock_map->lock_cnt++;

      }

    }

  }

  return result;

}

 

UnlockKey邏輯相對簡單一些,主要是刪除加鎖的記錄,並且喚醒被阻塞的事務。

void TransactionLockMgr::UnLockKey(const PessimisticTransaction* txn,

                                   const std::string& key,

                                   LockMapStripe* stripe, LockMap* lock_map,

                                   Env* env) {

  TransactionID txn_id = txn->GetID();

  auto stripe_iter = stripe->keys.find(key);

  if (stripe_iter != stripe->keys.end()) {

    auto& txns = stripe_iter->second.txn_ids;

    auto txn_it = std::find(txns.begin(), txns.end(), txn_id);

    // Found the key we locked.  unlock it.

    if (txn_it != txns.end()) {

      if (txns.size() == 1) {

        stripe->keys.erase(stripe_iter);

      } else {

        auto last_it = txns.end() – 1;

        if (txn_it != last_it) {

          *txn_it = *last_it;

        }

        txns.pop_back();

      }

       if (max_num_locks_ > 0) {

        // Maintain lock count if there is a limit on the number of locks.

        assert(lock_map->lock_cnt.load(std::memory_order_relaxed) > 0);

        lock_map->lock_cnt–;

      }

    }

  } else {

    // This key is either not locked or locked by someone else.  This should

    // only happen if the unlocking transaction has expired.

    assert(txn->GetExpirationTime() > 0 &&

           txn->GetExpirationTime() < env->NowMicros());

  }

}