SPACE:通过空间注意和分解的无监督的面向对象的场景表示(cs CV)
- 2020 年 1 月 10 日
- 筆記
将复杂的多对象场景分解为有意义的抽象(如对象)的能力是实现更高级别认知的基础。以往的无监督的面向对象的场景表示学习方法或基于空间注意方法,或者基于场景混合方法,并且在可伸缩性方面受到限制,这是对现实世界场景进行建模的主要障碍。在本文中,我们提出了一个生成潜在变量模型,称为SPACE,该模型提供了一个统一的概率建模框架,该框架结合了最佳的空间注意力和场景混合方法。SPACE可以为前景对象显式提供分解对象表示,同时还可以分解复杂形态的背景片段。以前的模型都能单独地在这两个方面做得很好,但不能兼而有之。SPACE还通过合并并行的空间注意力解决了以前方法的可伸缩性问题,因此可应用于具有大量对象的场景而不会降低性能。通过在Atari和3D-Rooms上进行的实验表明,与SPAIR,IODINE和GENESIS相比,SPACE始终具有上述性能。实验结果可在我们这个网站上找到:这个https URL
原文题目:SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
原文:The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either based on spatial-attention or scene-mixture approaches and limited in scalability which is a main obstacle towards modeling real-world scenes. In this paper, we propose a generative latent variable model, called SPACE, that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches. SPACE can explicitly provide factorized object representations for foreground objects while also decomposing background segments of complex morphology. Previous models are good at either of these, but not both. SPACE also resolves the scalability problems of previous methods by incorporating parallel spatial-attention and thus is applicable to scenes with a large number of objects without performance degradations. We show through experiments on Atari and 3D-Rooms that SPACE achieves the above properties consistently in comparison to SPAIR, IODINE, and GENESIS. Results of our experiments can be found on our project website:this https URL
原文作者:Zhixuan Lin,Yi-Fu Wu,Skand Vishwanath Peri,Weihao Sun,Gautam Singh,Fei Deng,Jindong Jiang,Sungjin Ahn
原文地址:https://arxiv.org/abs/2001.02407