少即是多:重新思考人类行为的概率模型( Human-Computer Interaction)
- 2020 年 1 月 14 日
- 筆記
机器人需要人类行为的模型来推断人类的目标和偏好,并预测人们会做什么。一个共同的模型是玻尔兹曼噪声理性决策模型,该模型假设人们近似优化一个奖励函数,并选择与他们的指数奖励成比例的轨迹。尽管该模型在多种机器人领域取得了成功,但它的根源在于计量经济学,在于对不同离散选项的建模决策,每个选项都有自己的用途或方向。相比之下,人类的轨迹则位于一个连续的空间中,其连续值特征影响着奖赏功能。我们建议重新考虑玻尔兹曼模型,并从头开始设计它来运行这样的轨道空间。我们引入了一个模型,它明确地考虑了轨迹之间的距离,而不仅仅是它们的回报。类似的轨迹现在共同影响决策,而不是各自独立地影响决策。我们首先展示我们的模型在用户研究中更好地解释了人类行为。然后,我们分析这对机器人推理的影响,首先在玩具环境中,我们有地面真相,并找到更准确的推理,最后为一个7自由度的机器人手臂学习用户演示。
原文题目:LESS is More: Rethinking Probabilistic Models of Human Behavior
原文: Robots need models of human behavior for both inferring human goals and preferences, and predicting what people will do. A com- mon model is the Boltzmann noisily-rational decision model, which assumes people approximately optimize a reward function and choose trajectories in proportion to their exponentiated reward. While this model has been successful in a variety of robotics do- mains, its roots lie in econometrics, and in modeling decisions among different discrete options, each with its own utility or re- ward. In contrast, human trajectories lie in a continuous space, with continuous-valued features that influence the reward function. We propose that it is time to rethink the Boltzmann model, and design it from the ground up to operate over such trajectory spaces. We introduce a model that explicitly accounts for distances between trajectories, rather than only their rewards. Rather than each trajec- tory affecting the decision independently, similar trajectories now affect the decision together. We start by showing that our model better explains human behavior in a user study. We then analyze the implications this has for robot inference, first in toy environments where we have ground truth and find more accurate inference, and finally for a 7DOF robot arm learning from user demonstrations.
原文作者:Andreea Bobu, Dexter R.R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan
原文地址: https://arxiv.org/abs/2001.04465