ICML2019-深度強化學習文章匯總

  • 2019 年 11 月 21 日
  • 筆記

深度強化學習-Report

來源:icml2019 conference

編輯:DeepRL

強化學習是一種通用的學習、預測和決策範式。RL為順序決策問題提供了解決方法,並將其轉化為順序決策問題。RL與優化、統計學、博弈論、因果推理、序貫實驗等有着深刻的聯繫,與近似動態規劃和最優控制有着很大的重疊,在科學、工程和藝術領域有着廣泛的應用。

RL最近在學術界取得了穩定的進展,如Atari遊戲、AlphaGo、VisuoMotor機械人政策。RL也被應用於現實場景,如推薦系統和神經架構搜索。請參閱有關RL應用程序的最新集合。希望RL系統能夠在現實世界中工作,並具有實際的好處。然而,RL存在着許多問題,如泛化、樣本效率、勘探與開發困境等。因此,RL遠未被廣泛部署。對於RL社區來說,常見的、關鍵的和緊迫的問題是:RL是否有廣泛的部署?問題是什麼?如何解決這些問題?

在國際會議上的機器學習(ICML)是一個國際學術會議上機器學習。它是機器學習和人工智能研究中高影響力的兩個主要會議之一。每年的ICML中都有大量的關於強化學習的文章,其中2019總共接收強化學習論文46篇(已經是很高比例了,快接近10%),下面是本次會議文章的總結,文章pdf版本匯總下載鏈接見文章末尾。

方法類文章

  • Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
  • Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
  • Quantifying Generalization in Reinforcement Learning
  • Policy Certificates: Towards Accountable Reinforcement Learning
  • Neural Logic Reinforcement Learning
  • Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning
  • Few-Shot Intent Inference via Meta-Inverse Reinforcement Learning
  • Calibrated Model-Based Deep Reinforcement Learning
  • Information-Theoretic Considerations in Batch Reinforcement Learning
  • Taming MAML: Control variates for unbiased meta-reinforcement learning gradient estimation
  • Option Discovery for Solving Sparse Reward Reinforcement Learning Problems

優化類文章

  • Fingerprint Policy Optimisation for Robust Reinforcement Learning
  • Collaborative Evolutionary Reinforcement Learning
  • Composing Value Functions in Reinforcement Learning
  • Task-Agnostic Dynamics Priors for Deep Reinforcement Learning
  • Policy Consolidation for Continual Reinforcement Learning

探索-利用及模型參數

  • Exploration Conscious Reinforcement Learning Revisited
  • Dynamic Weights in Multi-Objective Deep Reinforcement Learning
  • Control Regularization for Reduced Variance Reinforcement Learning
  • Dead-ends and Secure Exploration in Reinforcement Learning
  • Off-Policy Deep Reinforcement Learning without Exploration
  • Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning
  • Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
  • On the Generalization Gap in Reparameterizable Reinforcement Learning

多智能體

  • Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
  • CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning
  • Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning
  • Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
  • Multi-Agent Adversarial Inverse Reinforcement Learning
  • Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI
  • QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning
  • Actor-Attention-Critic for Multi-Agent Reinforcement Learning

圖模型強化學習

  • TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning
  • SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

分佈式強化學習

  • Statistics and Samples in Distributional Reinforcement Learning
  • Distribution Reinforcement Learning for Efficient Exploration

應用類

  • Action Robust Reinforcement Learning and Applications in Continuous Control
  • Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation
  • Learning Action Representations for Reinforcement Learning
  • The Value Function Polytope in Reinforcement Learning
  • Generative Adversarial User Model for Reinforcement Learning Based Recommendation System

其他

  • Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
  • A Deep Reinforcement Learning Perspective on Internet Congestion Control
  • Reinforcement Learning in Configurable Continuous Environments
  • Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

註:部分文章還沒有在arxiv上,或者沒有的請自行Google