从人体密集联系的部位学习3D人体形状和姿态(CS CV)

尽管通过最新基于学习的方法取得了有价值的成果,但使用单像重建3D人体形状和姿态仍具有挑战性。常见的不对齐问题的出现是由于图像到模型空间的映射是高度非线性的,并且基于旋转人体模型的姿态很容易导致关节位置的偏移。在这项工作中,我们研究了从人体部位的密集对应关系中学习3D人体形状和姿态的问题,并提出了分解和聚合网络(DaNet)来解决这些问题。DaNet作为中间代表采用密集的对应映射,也就是在2D像素和3D顶点之间建立一座桥梁,以促进2D到3D映射的学习。DaNet的预测模块被分解为一个全局流和多个局部流,以确保分别实现形状和姿势预测的全局和细粒度感知。来自局部流的信息被进一步整合以增强对基于旋转的姿态的鲁棒性预测,并且其中还提出了位置辅助旋转特征细化策略用来充分利用人体关节之间的空间关系。此外,还引入了基于部位的Dropout(PartDrop)策略,以在训练过程中从中间代表中剔除密集的信息,从而鼓励网络更多地专注于互补的身体部位以及相邻的位置特征。我们的方法的有效性在采集数据和真实世界中数据集(包括Human3.6M,UP3D和DensePose-COCO数据集)上均得到验证。实验结果表明,与现有的重建方法相比,该方法显著地提高了重建的性能。我们的代码将在以下位置公开提供:this https URL.

原文标题:Learning 3D Human Shape and Pose from Dense Body Parts

原文:Reconstructing 3D human shape and pose from a monocular image is challenging despite the promising results achieved by the most recent learning-based methods. The commonly occurred misalignment comes from the facts that the mapping from images to the model space is highly non-linear and the rotation-based pose representation of the body model is prone to result in the drift of joint positions. In this work, we investigate learning 3D human shape and pose from dense correspondences of body parts and propose a Decompose-and-aggregate Network (DaNet) to address these issues. DaNet adopts the dense correspondence maps, which densely build a bridge between 2D pixels and 3D vertexes, as intermediate representations to facilitate the learning of 2D-to-3D mapping. The prediction modules of DaNet are decomposed into one global stream and multiple local streams to enable global and fine-grained perceptions for the shape and pose predictions, respectively. Messages from local streams are further aggregated to enhance the robust prediction of the rotation-based poses, where a position-aided rotation feature refinement strategy is proposed to exploit spatial relationships between body joints. Moreover, a Part-based Dropout (PartDrop) strategy is introduced to drop out dense information from intermediate representations during training, encouraging the network to focus on more complementary body parts as well as adjacent position features. The effectiveness of our method is validated on both in-door and real-world datasets including the Human3.6M, UP3D, and DensePose-COCO datasets. Experimental results show that the proposed method significantly improves the reconstruction performance in comparison with previous state-of-the-art methods. Our code will be made publicly available atthis https URL.

原文作者:Hongwen Zhang,Jie Cao,Guo Lu,Wanli Ouyang,Zhenan Sun

原文地址:https://arxiv.org/abs/1912.13344