少即是多:重新思考人類行為的概率模型( Human-Computer Interaction)

  • 2020 年 1 月 14 日
  • 筆記

機器人需要人類行為的模型來推斷人類的目標和偏好,並預測人們會做什麼。一個共同的模型是玻爾茲曼雜訊理性決策模型,該模型假設人們近似優化一個獎勵函數,並選擇與他們的指數獎勵成比例的軌跡。儘管該模型在多種機器人領域取得了成功,但它的根源在於計量經濟學,在於對不同離散選項的建模決策,每個選項都有自己的用途或方向。相比之下,人類的軌跡則位於一個連續的空間中,其連續值特徵影響著獎賞功能。我們建議重新考慮玻爾茲曼模型,並從頭開始設計它來運行這樣的軌道空間。我們引入了一個模型,它明確地考慮了軌跡之間的距離,而不僅僅是它們的回報。類似的軌跡現在共同影響決策,而不是各自獨立地影響決策。我們首先展示我們的模型在用戶研究中更好地解釋了人類行為。然後,我們分析這對機器人推理的影響,首先在玩具環境中,我們有地面真相,並找到更準確的推理,最後為一個7自由度的機器人手臂學慣用戶演示。

原文題目:LESS is More: Rethinking Probabilistic Models of Human Behavior

原文: Robots need models of human behavior for both inferring human goals and preferences, and predicting what people will do. A com- mon model is the Boltzmann noisily-rational decision model, which assumes people approximately optimize a reward function and choose trajectories in proportion to their exponentiated reward. While this model has been successful in a variety of robotics do- mains, its roots lie in econometrics, and in modeling decisions among different discrete options, each with its own utility or re- ward. In contrast, human trajectories lie in a continuous space, with continuous-valued features that influence the reward function. We propose that it is time to rethink the Boltzmann model, and design it from the ground up to operate over such trajectory spaces. We introduce a model that explicitly accounts for distances between trajectories, rather than only their rewards. Rather than each trajec- tory affecting the decision independently, similar trajectories now affect the decision together. We start by showing that our model better explains human behavior in a user study. We then analyze the implications this has for robot inference, first in toy environments where we have ground truth and find more accurate inference, and finally for a 7DOF robot arm learning from user demonstrations.

原文作者:Andreea Bobu, Dexter R.R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan

原文地址: https://arxiv.org/abs/2001.04465