项目实战 DeepLabV1,V2,V3 Google三大语义分割算法源码解析

2019 年 12 月 9 日
筆記

前言

算法和工程是算法工程师不可缺少的两种能力，之前我介绍了DeepLab V1，V2, V3，但总是感觉少了点什么？只有Paper，没有源码那不相当于是纸上谈兵了，所以今天尝试结合论文的源码来进行仔细的分析这三个算法。等我们分析清楚这三个算法之后，有机会再解析一下DeepLabV3。由于最近正在看Pytorch版本的《动手学深度学习》，不妨用Pytorch的源码来进行分析。我分析的源码均来自这个Pytorch工程：https://github.com/kazuto1011/deeplab-pytorch/tree/master/libs/models

DeepLab V1源码分析

DeepLab V1的算法原理可以看我之前的推文，地址是：https://mp.weixin.qq.com/s/rvP8-Y-CRuq4HFzR0qJWcg 。我们今天解析的DeepLab系列网络模型是在ResNet残差模块的基础上配合空洞卷积实现的，对于DeepLab V1, 第一层为普通卷积，stride = 2，紧跟着 stride = 2 的 max-pooling，然后一个普通的 bottleneck ，一个 stride = 2 的 bottleneck，然后 dilation =2、dilation =4 的bottleneck。

from __future__ import absolute_import, print_function    import torch  import torch.nn as nn  import torch.nn.functional as F    # 定义DeepLabV1的网络结构  class DeepLabV1(nn.Sequential):      """      DeepLab v1: Dilated ResNet + 1x1 Conv      Note that this is just a container for loading the pretrained COCO model and not mentioned as "v1" in papers.      """        def __init__(self, n_classes, n_blocks):          super(DeepLabV1, self).__init__()          ch = [64 * 2 ** p for p in range(6)]          self.add_module("layer1", _Stem(ch[0]))          self.add_module("layer2", _ResLayer(n_blocks[0], ch[0], ch[2], 1, 1))          self.add_module("layer3", _ResLayer(n_blocks[1], ch[2], ch[3], 2, 1))          self.add_module("layer4", _ResLayer(n_blocks[2], ch[3], ch[4], 1, 2))          self.add_module("layer5", _ResLayer(n_blocks[3], ch[4], ch[5], 1, 4))          self.add_module("fc", nn.Conv2d(2048, n_classes, 1))    # 这里是看一下是使用torch的nn模块中BatchNorm还是在encoding文件中定义的BatchNorm    try:      from encoding.nn import SyncBatchNorm        _BATCH_NORM = SyncBatchNorm  except:      _BATCH_NORM = nn.BatchNorm2d    _BOTTLENECK_EXPANSION = 4    # 定义卷积+BN+ReLU的组件  class _ConvBnReLU(nn.Sequential):      """      Cascade of 2D convolution, batch norm, and ReLU.      """        BATCH_NORM = _BATCH_NORM        def __init__(              self, in_ch, out_ch, kernel_size, stride, padding, dilation, relu=True      ):          super(_ConvBnReLU, self).__init__()          self.add_module(              "conv",              nn.Conv2d(                  in_ch, out_ch, kernel_size, stride, padding, dilation, bias=False              ),          )          self.add_module("bn", _BATCH_NORM(out_ch, eps=1e-5, momentum=0.999))            if relu:              self.add_module("relu", nn.ReLU())      # 定义Bottleneck，先1*1卷积降维，然后使用3*3卷积，最后再1*1卷积升维，然后再shortcut连接。  # 降维到多少是由_BOTTLENECK_EXPANSION参数决定的，这是ResNet的Bottleneck。  class _Bottleneck(nn.Module):      """      Bottleneck block of MSRA ResNet.      """        def __init__(self, in_ch, out_ch, stride, dilation, downsample):          super(_Bottleneck, self).__init__()          mid_ch = out_ch // _BOTTLENECK_EXPANSION          self.reduce = _ConvBnReLU(in_ch, mid_ch, 1, stride, 0, 1, True)          self.conv3x3 = _ConvBnReLU(mid_ch, mid_ch, 3, 1, dilation, dilation, True)          self.increase = _ConvBnReLU(mid_ch, out_ch, 1, 1, 0, 1, False)          self.shortcut = (              _ConvBnReLU(in_ch, out_ch, 1, stride, 0, 1, False)              if downsample              else lambda x: x  # identity          )        def forward(self, x):          h = self.reduce(x)          h = self.conv3x3(h)          h = self.increase(h)          h += self.shortcut(x)          return F.relu(h)    # 定义ResLayer，整个DeepLabv1是用ResLayer堆叠起来的，下采样是在每个ResLayer的第一个  # Bottleneck发生的。  class _ResLayer(nn.Sequential):      """      Residual layer with multi grids      """        def __init__(self, n_layers, in_ch, out_ch, stride, dilation, multi_grids=None):          super(_ResLayer, self).__init__()            if multi_grids is None:              multi_grids = [1 for _ in range(n_layers)]          else:              assert n_layers == len(multi_grids)            # Downsampling is only in the first block          for i in range(n_layers):              self.add_module(                  "block{}".format(i + 1),                  _Bottleneck(                      in_ch=(in_ch if i == 0 else out_ch),                      out_ch=out_ch,                      stride=(stride if i == 0 else 1),                      dilation=dilation * multi_grids[i],                      downsample=(True if i == 0 else False),                  ),              )    # 在进入ResLayer之前，先用7*7的卷积核在原图滑动，增大感受野。padding方式设为same，大小不变。  # Pool层的核大小为3，步长为2，这会导致特征图的分辨率发生变化。  class _Stem(nn.Sequential):      """      The 1st conv layer.      Note that the max pooling is different from both MSRA and FAIR ResNet.      """        def __init__(self, out_ch):          super(_Stem, self).__init__()          self.add_module("conv1", _ConvBnReLU(3, out_ch, 7, 2, 3, 1))          self.add_module("pool", nn.MaxPool2d(3, 2, 1, ceil_mode=True))    # 相当于Reshape，网络并没有用到  class _Flatten(nn.Module):      def forward(self, x):          return x.view(x.size(0), -1)    # 主函数，输出构建的DeepLab V1模型的结构还有原始图像分辨率和结果图像的分辨率  if __name__ == "__main__":      model = DeepLabV1(n_classes=21, n_blocks=[3, 4, 23, 3])      #model.eval()      image = torch.randn(1, 3, 513, 513)        print(model)      print("input:", image.shape)      print("output:", model(image).shape)

我们看一下网络的输入和输出特征图尺寸：

input: torch.Size([1, 3, 513, 513])  output: torch.Size([1, 21, 65, 65])

网络结构已经非常清晰了，可以直接运行Python代码打印出网络结构或者按照我的源码注释来理解。注意，训练的时候ground truth要resize到和模型的输出特征图尺寸一样大才可以。

DeepLab V2源码分析

DeepLab V2的论文解读请看我前面发的文章：https://mp.weixin.qq.com/s/ylv3QfOe_BOuVuxQTd_m_g 。简单的说，DeepLab V2就是DeepLab V1的基础上加了一个ASPP模块，这是一个类似于Inception模块的结构，包含不同膨胀系数的空洞卷积，增强模型识别同一物体的多尺度能力。这里仍然只分析源码：为了方便理解把上篇文章中的ASPP模块的示意图放在这里：

在这里插入图片描述

from __future__ import absolute_import, print_function    import torch  import torch.nn as nn  import torch.nn.functional as F      # 定义ASPP模块，这是DeepLab V2和V1的主要区别，可以看到其他部分和V1的代码一模一样  class _ASPP(nn.Module):      """      Atrous spatial pyramid pooling (ASPP)      """        def __init__(self, in_ch, out_ch, rates):          super(_ASPP, self).__init__()          for i, rate in enumerate(rates):              self.add_module(                  "c{}".format(i),                  nn.Conv2d(in_ch, out_ch, 3, 1, padding=rate, dilation=rate, bias=True),              )            for m in self.children():              nn.init.normal_(m.weight, mean=0, std=0.01)              nn.init.constant_(m.bias, 0)        def forward(self, x):          return sum([stage(x) for stage in self.children()])      class DeepLabV2(nn.Sequential):      """      DeepLab v2: Dilated ResNet + ASPP      Output stride is fixed at 8      """        def __init__(self, n_classes, n_blocks, atrous_rates):          super(DeepLabV2, self).__init__()          ch = [64 * 2 ** p for p in range(6)]          self.add_module("layer1", _Stem(ch[0]))          self.add_module("layer2", _ResLayer(n_blocks[0], ch[0], ch[2], 1, 1))          self.add_module("layer3", _ResLayer(n_blocks[1], ch[2], ch[3], 2, 1))          self.add_module("layer4", _ResLayer(n_blocks[2], ch[3], ch[4], 1, 2))          self.add_module("layer5", _ResLayer(n_blocks[3], ch[4], ch[5], 1, 4))          self.add_module("aspp", _ASPP(ch[5], n_classes, atrous_rates))        def freeze_bn(self):          for m in self.modules():              if isinstance(m, _ConvBnReLU.BATCH_NORM):                  m.eval()      try:      from encoding.nn import SyncBatchNorm        _BATCH_NORM = SyncBatchNorm  except:      _BATCH_NORM = nn.BatchNorm2d    _BOTTLENECK_EXPANSION = 4        class _ConvBnReLU(nn.Sequential):      """      Cascade of 2D convolution, batch norm, and ReLU.      """        BATCH_NORM = _BATCH_NORM        def __init__(              self, in_ch, out_ch, kernel_size, stride, padding, dilation, relu=True      ):          super(_ConvBnReLU, self).__init__()          self.add_module(              "conv",              nn.Conv2d(                  in_ch, out_ch, kernel_size, stride, padding, dilation, bias=False              ),          )          self.add_module("bn", _BATCH_NORM(out_ch, eps=1e-5, momentum=0.999))            if relu:              self.add_module("relu", nn.ReLU())      class _Bottleneck(nn.Module):      """      Bottleneck block of MSRA ResNet.      """        def __init__(self, in_ch, out_ch, stride, dilation, downsample):          super(_Bottleneck, self).__init__()          mid_ch = out_ch // _BOTTLENECK_EXPANSION          self.reduce = _ConvBnReLU(in_ch, mid_ch, 1, stride, 0, 1, True)          self.conv3x3 = _ConvBnReLU(mid_ch, mid_ch, 3, 1, dilation, dilation, True)          self.increase = _ConvBnReLU(mid_ch, out_ch, 1, 1, 0, 1, False)          self.shortcut = (              _ConvBnReLU(in_ch, out_ch, 1, stride, 0, 1, False)              if downsample              else lambda x: x  # identity          )        def forward(self, x):          h = self.reduce(x)          h = self.conv3x3(h)          h = self.increase(h)          h += self.shortcut(x)          return F.relu(h)      class _ResLayer(nn.Sequential):      """      Residual layer with multi grids      """        def __init__(self, n_layers, in_ch, out_ch, stride, dilation, multi_grids=None):          super(_ResLayer, self).__init__()            if multi_grids is None:              multi_grids = [1 for _ in range(n_layers)]          else:              assert n_layers == len(multi_grids)            # Downsampling is only in the first block          for i in range(n_layers):              self.add_module(                  "block{}".format(i + 1),                  _Bottleneck(                      in_ch=(in_ch if i == 0 else out_ch),                      out_ch=out_ch,                      stride=(stride if i == 0 else 1),                      dilation=dilation * multi_grids[i],                      downsample=(True if i == 0 else False),                  ),              )      class _Stem(nn.Sequential):      """      The 1st conv layer.      Note that the max pooling is different from both MSRA and FAIR ResNet.      """        def __init__(self, out_ch):          super(_Stem, self).__init__()          self.add_module("conv1", _ConvBnReLU(3, out_ch, 7, 2, 3, 1))          self.add_module("pool", nn.MaxPool2d(3, 2, 1, ceil_mode=True))      if __name__ == "__main__":      model = DeepLabV2(          n_classes=21, n_blocks=[3, 4, 23, 3], atrous_rates=[6, 12, 18, 24]      )      model.eval()      image = torch.randn(1, 3, 513, 513)        print(model)      print("input:", image.shape)      print("output:", model(image).shape)

可以看到DeepLab V2的代码除了ASPP模块，其他部分和V1完全一样，所以就没什么好解释的了。但需要注意的一个点是，训练的时候，DeepLabV2的学习率采用了Poly的策略，公式为：，当时，模型可以取得不普通的分段学习策略MAP值高1.17%的效果。这部分作者也在他的代码中实现了，如下所示：

作者：Uno Whoiam  链接：https://zhuanlan.zhihu.com/p/68531147  来源：知乎  著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。    from torch.optim.lr_scheduler import _LRScheduler      class PolynomialLR(_LRScheduler):      def __init__(self, optimizer, step_size, iter_max, power, last_epoch=-1):          self.step_size = step_size          self.iter_max = iter_max          self.power = power          super(PolynomialLR, self).__init__(optimizer, last_epoch)        def polynomial_decay(self, lr):          return lr * (1 - float(self.last_epoch) / self.iter_max) ** self.power        def get_lr(self):          if (              (self.last_epoch == 0)              or (self.last_epoch % self.step_size != 0)              or (self.last_epoch > self.iter_max)          ):              return [group["lr"] for group in self.optimizer.param_groups]          return [self.polynomial_decay(lr) for lr in self.base_lrs]

可以看到这个类是直接继承了Pytorch中的学习率调整类_LRScheduler，可以方便的在每个epoch进行学习率调整。

最后网络的输入分辨率和输出分辨率和DeepLab V1一样，具体训练和数据制作请看作者的github工程：https://github.com/kazuto1011/deeplab-pytorch/tree/master/libs/models 。

DeepLab V3源码分析

DeepLab V3论文原理请看我之前发的推文：https://mp.weixin.qq.com/s/D9OX89mklaU4tv74OZMqNg 。这里再简单回归一下DeepLab V3使用的关键Trick。

将BN层加到了ASPP模块中。
使用了Multi-Grid策略，即在模型后端多加几层不同rate的空洞卷积。
具有不同 atrous rates 的 ASPP 能够有效的捕获多尺度信息。不过，论文发现，随着sampling rate的增加，有效filter特征权重(即有效特征区域，而不是补零区域的权重)的数量会变小，极端情况下，当空洞卷积的 rate 和 feature map 的大小一致时，卷积会退化为卷积。为了解决这一问题，并将全局内容信息整合到模型中，则采用图像级特征。即，采用全局平均池化(global average pooling)对模型的 feature map 进行处理，将得到的图像级特征输入到一个 1×1 convolution with 256 filters(加入 batch normalization)中，然后将特征进行双线性上采样(bilinearly upsample)到特定的空间维度。

DeepLab V3的源码如下：

from __future__ import absolute_import, print_function    from collections import OrderedDict    import torch  import torch.nn as nn  import torch.nn.functional as F    # 全局平均池化，将得到的图像特征输入到一个拥有256个通道的1*1卷积中，最后将特征进行  # 双线性上采样到特定的维度(就是输入到ImagePool之前特征图的维度)  class _ImagePool(nn.Module):      def __init__(self, in_ch, out_ch):          super().__init__()          self.pool = nn.AdaptiveAvgPool2d(1)          self.conv = _ConvBnReLU(in_ch, out_ch, 1, 1, 0, 1)        def forward(self, x):          _, _, H, W = x.shape          h = self.pool(x)          h = self.conv(h)          h = F.interpolate(h, size=(H, W), mode="bilinear", align_corners=False)          return h    # ASPP模块，DeepLabV3改进后的，新增了1*1卷积以及图像全局池化。  class _ASPP(nn.Module):      """      Atrous spatial pyramid pooling with image-level feature      """        def __init__(self, in_ch, out_ch, rates):          super(_ASPP, self).__init__()          self.stages = nn.Module()          self.stages.add_module("c0", _ConvBnReLU(in_ch, out_ch, 1, 1, 0, 1))          for i, rate in enumerate(rates):              self.stages.add_module(                  "c{}".format(i + 1),                  _ConvBnReLU(in_ch, out_ch, 3, 1, padding=rate, dilation=rate),              )          self.stages.add_module("imagepool", _ImagePool(in_ch, out_ch))        def forward(self, x):          return torch.cat([stage(x) for stage in self.stages.children()], dim=1)    # 完整的DeepLabV3的结构，使用带空洞卷积的ResNet+multi-grid策略+改进后的ASPP  class DeepLabV3(nn.Sequential):      """      DeepLab v3: Dilated ResNet with multi-grid + improved ASPP      """        def __init__(self, n_classes, n_blocks, atrous_rates, multi_grids, output_stride):          super(DeepLabV3, self).__init__()            # Stride and dilation          if output_stride == 8:              s = [1, 2, 1, 1]              d = [1, 1, 2, 4]          elif output_stride == 16:              s = [1, 2, 2, 1]              d = [1, 1, 1, 2]            ch = [64 * 2 ** p for p in range(6)]          self.add_module("layer1", _Stem(ch[0]))          self.add_module("layer2", _ResLayer(n_blocks[0], ch[0], ch[2], s[0], d[0]))          self.add_module("layer3", _ResLayer(n_blocks[1], ch[2], ch[3], s[1], d[1]))          self.add_module("layer4", _ResLayer(n_blocks[2], ch[3], ch[4], s[2], d[2]))          self.add_module(              "layer5", _ResLayer(n_blocks[3], ch[4], ch[5], s[3], d[3], multi_grids)          )          self.add_module("aspp", _ASPP(ch[5], 256, atrous_rates))          # 连接所有分支的最终特征，输入到256个通道的1*1卷积中，并加入BN，再进入最终的1*1卷积，          # 得到logits结果。          concat_ch = 256 * (len(atrous_rates) + 2)          self.add_module("fc1", _ConvBnReLU(concat_ch, 256, 1, 1, 0, 1))          self.add_module("fc2", nn.Conv2d(256, n_classes, kernel_size=1))      try:      from encoding.nn import SyncBatchNorm        _BATCH_NORM = SyncBatchNorm  except:      _BATCH_NORM = nn.BatchNorm2d    _BOTTLENECK_EXPANSION = 4    # 和DeepLabV1定义一样  class _ConvBnReLU(nn.Sequential):      """      Cascade of 2D convolution, batch norm, and ReLU.      """        BATCH_NORM = _BATCH_NORM        def __init__(              self, in_ch, out_ch, kernel_size, stride, padding, dilation, relu=True      ):          super(_ConvBnReLU, self).__init__()          self.add_module(              "conv",              nn.Conv2d(                  in_ch, out_ch, kernel_size, stride, padding, dilation, bias=False              ),          )          self.add_module("bn", _BATCH_NORM(out_ch, eps=1e-5, momentum=0.999))            if relu:              self.add_module("relu", nn.ReLU())      class _Bottleneck(nn.Module):      """      Bottleneck block of MSRA ResNet.      """        def __init__(self, in_ch, out_ch, stride, dilation, downsample):          super(_Bottleneck, self).__init__()          mid_ch = out_ch // _BOTTLENECK_EXPANSION          self.reduce = _ConvBnReLU(in_ch, mid_ch, 1, stride, 0, 1, True)          self.conv3x3 = _ConvBnReLU(mid_ch, mid_ch, 3, 1, dilation, dilation, True)          self.increase = _ConvBnReLU(mid_ch, out_ch, 1, 1, 0, 1, False)          self.shortcut = (              _ConvBnReLU(in_ch, out_ch, 1, stride, 0, 1, False)              if downsample              else lambda x: x  # identity          )        def forward(self, x):          h = self.reduce(x)          h = self.conv3x3(h)          h = self.increase(h)          h += self.shortcut(x)          return F.relu(h)      class _ResLayer(nn.Sequential):      """      Residual layer with multi grids      """        def __init__(self, n_layers, in_ch, out_ch, stride, dilation, multi_grids=None):          super(_ResLayer, self).__init__()            if multi_grids is None:              multi_grids = [1 for _ in range(n_layers)]          else:              assert n_layers == len(multi_grids)            # Downsampling is only in the first block          for i in range(n_layers):              self.add_module(                  "block{}".format(i + 1),                  _Bottleneck(                      in_ch=(in_ch if i == 0 else out_ch),                      out_ch=out_ch,                      stride=(stride if i == 0 else 1),                      dilation=dilation * multi_grids[i],                      downsample=(True if i == 0 else False),                  ),              )      class _Stem(nn.Sequential):      """      The 1st conv layer.      Note that the max pooling is different from both MSRA and FAIR ResNet.      """        def __init__(self, out_ch):          super(_Stem, self).__init__()          self.add_module("conv1", _ConvBnReLU(3, out_ch, 7, 2, 3, 1))          self.add_module("pool", nn.MaxPool2d(3, 2, 1, ceil_mode=True))      if __name__ == "__main__":      model = DeepLabV3(          n_classes=21,          n_blocks=[3, 4, 23, 3],          atrous_rates=[6, 12, 18],          multi_grids=[1, 2, 4],          output_stride=8,      )      model.eval()      image = torch.randn(1, 3, 513, 513)        print(model)      print("input:", image.shape)      print("output:", model(image).shape)

和V1，V2的区别在源码里详细注释了。最后DeepLab V3得到输出结果和V1/V2得到输出结果是一致的，训练标签的设置也是一致的。

结论

通过源码解析，应该可以对DeepLab V1,V2,V3的原理和特征图维度变化以及训练有清楚的认识了，所以暂时就讲到这里了。之后有时间再补上DeepLab V3 Plus的论文理解和源码解析语义分割就算暂时完结了。之后准备做目标检测/分类网络的解析，敬请期待吧。

代码链接

https://github.com/kazuto1011/deeplab-pytorch/tree/master/libs/models

项目实战 DeepLabV1,V2,V3 Google三大语义分割算法源码解析

前言

DeepLab V1源码分析

DeepLab V2源码分析

DeepLab V3源码分析

结论

代码链接

VirMach 便宜 VPS

QNews

项目实战 DeepLabV1,V2,V3 Google三大语义分割算法源码解析

前言

DeepLab V1源码分析

DeepLab V2源码分析

DeepLab V3源码分析

结论

代码链接

分享此文：

Related Posts

NLP之keras中文文本分类系列算法封装，简单易用(超详细教程)

性能工具之代码级性能测试工具ContiPerf

人脸识别系列一 | 特征脸法

《ParseNet》论文阅读

VirMach 便宜 VPS

QNews

熱門搜尋