弱监督的视觉语义分析(CS cv)
- 2020 年 1 月 10 日
- 筆記
场景图生成(Scene Graph Generation, SGG)旨在从图像中提取实体、谓词及其内在结构,从而深入理解视觉内容,具有许多潜在的应用,如视觉推理和图像检索。然而,计算机视觉还远远不能解决这个问题。现有的SGG方法需要为大量图像中的场景图实体提供数百万个手工注释的边界框。除此此外,它们在计算上效率也很低,因为它们要详尽地处理所有对对象建议以预测它们之间的关系。
在这篇论文中,我们首先提出了一个广义的SGG表达式,即可视化语义分析,它将实体和谓词预测分离开来,并实现了次二次性能。并且在此基础上,还提出了一种基于新型三阶段信息传播网络的可视化语义分析网络,以及一种基于角色驱动的注意机制,在不增加二次成本的情况下有效地路由信息。最后,我们提出了第一个基于图对齐算法和基于图弱监督的学习框架,该框架可以实现无边框标注的训练。通过在Visual Genome数据集上的大量实验,我们证明了textsc{VSPNet}明显优于弱监督的基准,并接近完全监督的性能,同时速度提高了5倍。
弱监督视觉语义分析
原文题目:Weakly Supervised Visual Semantic Parsing
原文:Scene Graph Generation (SGG) aims to extract entities, predicates and their intrinsic structure from images, leading to a deep understanding of visual content, with many potential applications such as visual reasoning and image retrieval. Nevertheless, computer vision is still far from a practical solution for this task. Existing SGG methods require millions of manually annotated bounding boxes for scene graph entities in a large set of images. Moreover, they are computationally inefficient, as they exhaustively process all pairs of object proposals to predict their relationships. In this paper, we address those two limitations by first proposing a generalized formulation of SGG, namely Visual Semantic Parsing, which disentangles entity and predicate prediction, and enables sub-quadratic performance. Then we propose the Visual Semantic Parsing Network, textsc{VSPNet}, based on a novel three-stage message propagation network, as well as a role-driven attention mechanism to route messages efficiently without a quadratic cost. Finally, we propose the first graph-based weakly supervised learning framework based on a novel graph alignment algorithm, which enables training without bounding box annotations. Through extensive experiments on the Visual Genome dataset, we show textsc{VSPNet} outperforms weakly supervised baselines significantly and approaches fully supervised performance, while being five times faster.
原文作者:Alireza Zareian,Svebor Karaman,Shih-Fu Chang
原文地址:https://arxiv.org/abs/2001.02359