移动设备上的高性能的深度和点卷积(CS PF)
- 2020 年 1 月 14 日
- 笔记
轻量级卷积神经网络(如MobileNets)是专门设计用于直接在移动设备上进行推理的。在各种轻量模型中,深度卷积(DWConv)和点态卷积(PWConv)是其关键运算。在本文中,我们观察到现有的DWConv和PWConv的实现并没有很好地利用移动设备中的ARM处理器,并且在多核和寄存器级数据重用差的情况下出现了大量的缓存丢失。提出了基于ARM架构的DWConv和PWConv的再优化技术。实验结果表明,我们的实现可以分别在DWConv和PWConv上对TVM (Chen et al. 2018)实现高达5.5倍和2.1倍的加速。
原文题目:High Performance Depthwise and Pointwise Convolutions on Mobile Devices
原文:Lightweight convolutional neural networks (e.g., MobileNets) are specifically designed to carry out inference directly on mobile devices. Among the various lightweight models, depthwise convolution (DWConv) and pointwise convolution (PWConv) are their key operations. In this paper, we observe that the existing implementations of DWConv and PWConv are not well utilizing the ARM processors in the mobile devices, and exhibit lots of cache misses under multi-core and poor data reuse at register level. We propose techniques to re-optimize the implementations of DWConv and PWConv based on ARM architecture. Experimental results show that our implementation can respectively achieve a speedup of up to 5.5x and 2.1x against TVM (Chen et al. 2018) on DWConv and PWConv.
原文作者:Pengfei Zhang, Eric Lo, Baotong Lu
原文地址:https://arxiv.org/abs/2001.02504