ICML2019-深度強化學習文章匯總

2019 年 11 月 21 日
筆記

深度強化學習-Report

來源：icml2019 conference

編輯：DeepRL

強化學習是一種通用的學習、預測和決策範式。RL為順序決策問題提供了解決方法，並將其轉化為順序決策問題。RL與優化、統計學、博弈論、因果推理、序貫實驗等有着深刻的聯繫，與近似動態規劃和最優控制有着很大的重疊，在科學、工程和藝術領域有着廣泛的應用。

RL最近在學術界取得了穩定的進展，如Atari遊戲、AlphaGo、VisuoMotor機械人政策。RL也被應用於現實場景，如推薦系統和神經架構搜索。請參閱有關RL應用程序的最新集合。希望RL系統能夠在現實世界中工作，並具有實際的好處。然而，RL存在着許多問題，如泛化、樣本效率、勘探與開發困境等。因此，RL遠未被廣泛部署。對於RL社區來說，常見的、關鍵的和緊迫的問題是：RL是否有廣泛的部署？問題是什麼？如何解決這些問題？

在國際會議上的機器學習（ICML）是一個國際學術會議上機器學習。它是機器學習和人工智能研究中高影響力的兩個主要會議之一。每年的ICML中都有大量的關於強化學習的文章,其中2019總共接收強化學習論文46篇（已經是很高比例了，快接近10%），下面是本次會議文章的總結，文章pdf版本匯總下載鏈接見文章末尾。

方法類文章

Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
Quantifying Generalization in Reinforcement Learning
Policy Certificates: Towards Accountable Reinforcement Learning
Neural Logic Reinforcement Learning
Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
Few-Shot Intent Inference via Meta-Inverse Reinforcement Learning
Calibrated Model-Based Deep Reinforcement Learning
Information-Theoretic Considerations in Batch Reinforcement Learning
Taming MAML: Control variates for unbiased meta-reinforcement learning gradient estimation
Option Discovery for Solving Sparse Reward Reinforcement Learning Problems

優化類文章

Fingerprint Policy Optimisation for Robust Reinforcement Learning
Collaborative Evolutionary Reinforcement Learning
Composing Value Functions in Reinforcement Learning
Task-Agnostic Dynamics Priors for Deep Reinforcement Learning
Policy Consolidation for Continual Reinforcement Learning

探索-利用及模型參數

Exploration Conscious Reinforcement Learning Revisited
Dynamic Weights in Multi-Objective Deep Reinforcement Learning
Control Regularization for Reduced Variance Reinforcement Learning
Dead-ends and Secure Exploration in Reinforcement Learning
Off-Policy Deep Reinforcement Learning without Exploration
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
On the Generalization Gap in Reparameterizable Reinforcement Learning

多智能體

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
Multi-Agent Adversarial Inverse Reinforcement Learning
Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
Actor-Attention-Critic for Multi-Agent Reinforcement Learning

圖模型強化學習

TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

分佈式強化學習

Statistics and Samples in Distributional Reinforcement Learning
Distribution Reinforcement Learning for Efficient Exploration

應用類

Action Robust Reinforcement Learning and Applications in Continuous Control
Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
Learning Action Representations for Reinforcement Learning
The Value Function Polytope in Reinforcement Learning
Generative Adversarial User Model for Reinforcement Learning Based Recommendation System

其他

Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
A Deep Reinforcement Learning Perspective on Internet Congestion Control
Reinforcement Learning in Configurable Continuous Environments
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

註：部分文章還沒有在arxiv上，或者沒有的請自行Google