seata源码中一个有趣的讨论

  • 2020 年 1 月 13 日
  • 筆記

看到一个比较有意思的讨论,分享一下: 1.起因是有人提了个issue,压测是,出现了一个问题TransactionException LockKeyConflict异常,全局锁冲突异常:

然后有人contributor提出了解决方案:

Move lock retry loop into ConnectionProxy.commit, that the number of LockKeyConflict can be significantly reduced whether auto-commit is true or false

将lock retry循环移到ConnectionProxy.commit中,无论自动提交是真还是假,都可以显著减少LockKeyConflict的数量。

commiter回复到:

I am very interested because we always have different ideas. When a LockConflictException occurs, it indicates that there are other distributed transactions that are executing that hold the same data primary key. We define the current distributed transaction as A and another distributed transaction as B. A holds the database lock and wants to get the global lock, and B holds the global lock. If at this point B wants to rollback this data in the second phase of the distributed transaction, it will try to acquire the database lock. According to your code, A will hold the database lock for a longer time. At this time, B may trigger the lock wait timeout exception and perform a rollback retry. We need to evaluate this.

我很感兴趣,因为我们总是有不同的想法。当发生LockConflictException时,它表明正在执行的其他分布式事务持有相同的数据主键。我们将当前分布式事务定义为A,另一个分布式事务定义为B。A持有数据库锁并希望获得全局锁,B持有全局锁。

如果此时B想在分布式事务的第二阶段回滚该数据,它将尝试获取数据库锁。根据您的代码,A将持有数据库锁的时间更长。此时,B可能触发锁等待超时异常并执行回滚重试。我们需要计算这个。

contributor回复:

Before I make this PR, I referred to seata document in wiki, e.g. 

在我做这个PR之前,我在wiki中参考了seata文档。

I think tx2 is A which you just mentioned and tx1 is B. I understand your worries. I also agree that we should make quick fail to avoid waiting too much. But rollback is not what we preferred, we prefer commit, according to issue 1438, we got exception immediately, may be just a moment we can get a success transaction. I think this PR is more in line with the figure in wiki. Is that a formal design?    I agree that we should evaluate this. What can we do with this PR?

我认为tx2是你刚才提到的A, tx1是b,我理解你的担心。我也同意我们应该让快速失败避免等待太多。但是回滚并不是我们所喜欢的,我们更喜欢提交,根据1438号问题,我们立刻得到了异常,可能只是一瞬间我们就可以得到一个成功的事务。我认为这个PR更符合wiki中的数字。这是正式的设计吗? 我同意我们应该对此进行评估。我们可以用这个PR做什么?

修改前:

    protected T executeAutoCommitTrue(Object[] args) throws Throwable {          T result = null;          AbstractConnectionProxy connectionProxy = statementProxy.getConnectionProxy();          LockRetryController lockRetryController = new LockRetryController();          try {              connectionProxy.setAutoCommit(false);              while (true) {                  try {                      result = executeAutoCommitFalse(args);                      connectionProxy.commit();                      break;                  } catch (LockConflictException lockConflict) {                      connectionProxy.getTargetConnection().rollback();                      lockRetryController.sleep(lockConflict);                  }              }            } catch (Exception e) {              // when exception occur in finally,this exception will lost, so just print it here              LOGGER.error("exception occur", e);              throw e;          } finally {              connectionProxy.setAutoCommit(true);          }          return result;      }
    @Override      public void commit() throws SQLException {          if (context.inGlobalTransaction()) {              processGlobalTransactionCommit();          } else if (context.isGlobalLockRequire()) {              processLocalCommitWithGlobalLocks();          } else {              targetConnection.commit();          }      }

看修改前的源码,是在while(true)中调用commit,修改后,这个while(true)转移到了commit中:

    protected T executeAutoCommitTrue(Object[] args) throws Throwable {          T result = null;          AbstractConnectionProxy connectionProxy = statementProxy.getConnectionProxy();          try {              connectionProxy.setAutoCommit(false);              result = executeAutoCommitFalse(args);              connectionProxy.commit();          } catch (Exception e) {              // when exception occur in finally,this exception will lost, so just print it here              LOGGER.error("exception occur", e);              connectionProxy.rollback();              throw e;          }
private void doCommit(boolean shouldRetry) throws SQLException {          LockRetryController lockRetryController = null;          while (true) {              try {                  if (context.inGlobalTransaction()) {                      processGlobalTransactionCommit();                  } else if (context.isGlobalLockRequire()) {                      processLocalCommitWithGlobalLocks();                  } else {                      targetConnection.commit();                  }                  break;              } catch (LockConflictException lockConflict) {                  if (!shouldRetry) {                      throw lockConflict;                  }                  if (lockRetryController == null) {                      lockRetryController = new LockRetryController();                  }                  lockRetryController.sleep(lockConflict);              }

两种设计都有理由,不知道最终是否会采纳这个设计。