【從零開始學習YOLOv3】5. 網路模型的構建

2020 年 2 月 21 日
筆記

前言：之前幾篇講了cfg文件的理解、數據集的構建、數據載入機制和超參數進化機制，本文將講解YOLOv3如何從cfg文件構造模型。本文涉及到一個比較有用的部分就是bias的設置，可以提升mAP、F1、P、R等指標，還能讓訓練過程更加平滑。

1. cfg文件

在YOLOv3中，修改網路結構很容易，只需要修改cfg文件即可。目前，cfg文件支援convolutional, maxpool, unsample, route, shortcut, yolo這幾個層。

而且作者也提供了多個cfg文件來進行網路構建，比如：yolov3.cfg、yolov3-tiny.cfg、yolov3-spp.cfg、csresnext50-panet-spp.cfg文件（提供的yolov3-spp-pan-scale.cfg文件，在程式碼級別還沒有提供支援）。

如果想要添加自定義的模組也很方便，比如說注意力機制模組、空洞卷積等，都可以簡單地得到添加或者修改。

為了更加方便的理解cfg文件網路是如何構建的，在這裡推薦一個Github上的網路結構可視化軟體：Netron，下圖是可視化yolov3-tiny的結果：

2. 網路模型構建

從train.py文件入手，其中涉及的網路構建的程式碼為：

# Initialize model  model = Darknet(cfg, arc=opt.arc).to(device)

然後沿著Darknet實現進行講解：

class Darknet(nn.Module):      # YOLOv3 object detection model      def __init__(self, cfg, img_size=(416, 416), arc='default'):          super(Darknet, self).__init__()          self.module_defs = parse_model_cfg(cfg)          self.module_list, self.routs = create_modules(self.module_defs, img_size, arc)          self.yolo_layers = get_yolo_layers(self)            # Darknet Header          self.version = np.array([0, 2, 5], dtype=np.int32)          # (int32) version info: major, minor, revision          self.seen = np.array([0], dtype=np.int64)          # (int64) number of images seen during training

以上文件中，比較關鍵的就是成員函變數module_defs、module_list、routs、yolo_layers四個成員函數，先對這幾個參數的意義進行解釋：

2.1 module_defs

調用了parse_model_cfg函數，得到了module_defs對象。實際上該函數是通過解析cfg文件，得到一個list，list中包含多個字典，每個字典保存的內容就是一個模組內容，比如說：

[convolutional]  batch_normalize=1  filters=128  size=3  stride=2  pad=1  activation=leaky

函數程式碼如下：

def parse_model_cfg(path):      # path參數為: cfg/yolov3-tiny.cfg      if not path.endswith('.cfg'):          path += '.cfg'      if not os.path.exists(path) and os.path.exists('cfg' + os.sep + path):          path = 'cfg' + os.sep + path        with open(path, 'r') as f:          lines = f.read().split('n')        # 去除以#開頭的，屬於注釋部分的內容      lines = [x for x in lines if x and not x.startswith('#')]      lines = [x.rstrip().lstrip() for x in lines]      mdefs = []  # 模組的定義      for line in lines:          if line.startswith('['):  # 標誌著一個模組的開始              '''              比如:              [shortcut]              from=-3              activation=linear              '''              mdefs.append({})              mdefs[-1]['type'] = line[1:-1].rstrip()              if mdefs[-1]['type'] == 'convolutional':                  mdefs[-1]['batch_normalize'] = 0                  # pre-populate with zeros (may be overwritten later)          else:              # 將鍵和鍵值放入字典              key, val = line.split("=")              key = key.rstrip()                if 'anchors' in key:                  mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2))  # np anchors              else:                  mdefs[-1][key] = val.strip()        # 支援的參數類型      supported = ['type', 'batch_normalize', 'filters', 'size',                   'stride', 'pad', 'activation', 'layers', 'groups',                   'from', 'mask', 'anchors', 'classes', 'num', 'jitter',                    'ignore_thresh', 'truth_thresh', 'random',                   'stride_x', 'stride_y']        # 判斷所有參數中是否有不符合要求的key      f = []      for x in mdefs[1:]:          [f.append(k) for k in x if k not in f]      u = [x for x in f if x not in supported]  # unsupported fields      assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)        return mdefs

返回的內容通過debug模式進行查看：

其中需要關注的就是anchor的組織：

可以看出，anchor是按照每兩個一對進行組織的，與我們的理解一致。

2.2 module_list&routs

這個部分是本文的核心，也是理解模型構建的關鍵。

在pytorch中，構建模型常見的有通過Sequential或者ModuleList進行構建。

通過Sequential構建

model=nn.Sequential()  model.add_module('conv',nn.Conv2d(3,3,3))  model.add_module('batchnorm',nn.BatchNorm2d(3))  model.add_module('activation_layer',nn.ReLU())

或者

model=nn.Sequential(      nn.Conv2d(3,3,3),      nn.BatchNorm2d(3),      nn.ReLU()      )

或者

from collections import OrderedDict  model=nn.Sequential(OrderedDict([      ('conv',nn.Conv2d(3,3,3)),      ('batchnorm',nn.BatchNorm2d(3)),      ('activation_layer',nn.ReLU())  ]))

通過sequential構建的模組內部實現了forward函數，可以直接傳入參數，進行調用。

通過ModuleList構建

model=nn.ModuleList([nn.Linear(3,4),  						 nn.ReLU(),  						 nn.Linear(4,2)])

ModuleList類似list，內部沒有實現forward函數，使用的時候需要構建forward函數,構建自己模型常用ModuleList函數建立子模型,建立forward函數實現前向傳播。

在YOLOv3中，靈活地結合了兩種使用方式，通過解析以上得到的module_defs，進行構建一個ModuleList，然後再通過構建forward函數進行前向傳播即可。

具體程式碼如下：

def create_modules(module_defs, img_size, arc):      # 通過module_defs進行構建模型      hyperparams = module_defs.pop(0)      output_filters = [int(hyperparams['channels'])]      module_list = nn.ModuleList()      routs = []  # 存儲了所有的層，在route、shortcut會使用到。      yolo_index = -1        for i, mdef in enumerate(module_defs):          modules = nn.Sequential()          '''          通過type字樣不同的類型，來進行模型構建          '''          if mdef['type'] == 'convolutional':              bn = int(mdef['batch_normalize'])              filters = int(mdef['filters'])              size = int(mdef['size'])              stride = int(mdef['stride']) if 'stride' in mdef else (int(                  mdef['stride_y']), int(mdef['stride_x']))              pad = (size - 1) // 2 if int(mdef['pad']) else 0              modules.add_module(                  'Conv2d',                  nn.Conv2d(                      in_channels=output_filters[-1],                      out_channels=filters,                      kernel_size=size,                      stride=stride,                      padding=pad,                      groups=int(mdef['groups']) if 'groups' in mdef else 1,                      bias=not bn))              if bn:                  modules.add_module('BatchNorm2d',                                     nn.BatchNorm2d(filters, momentum=0.1))              if mdef['activation'] == 'leaky':  # TODO: activation study https://github.com/ultralytics/yolov3/issues/441                  modules.add_module('activation', nn.LeakyReLU(0.1,                                                                inplace=True))              elif mdef['activation'] == 'swish':                  modules.add_module('activation', Swish())              # 在此處可以添加新的激活函數            elif mdef['type'] == 'maxpool':              # 最大池化操作              size = int(mdef['size'])              stride = int(mdef['stride'])              maxpool = nn.MaxPool2d(kernel_size=size,                                     stride=stride,                                     padding=int((size - 1) // 2))              if size == 2 and stride == 1:  # yolov3-tiny                  modules.add_module('ZeroPad2d', nn.ZeroPad2d((0, 1, 0, 1)))                  modules.add_module('MaxPool2d', maxpool)              else:                  modules = maxpool            elif mdef['type'] == 'upsample':              # 通過近鄰插值完成上取樣              modules = nn.Upsample(scale_factor=int(mdef['stride']),                                    mode='nearest')            elif mdef['type'] == 'route':              # nn.Sequential() placeholder for 'route' layer              layers = [int(x) for x in mdef['layers'].split(',')]              filters = sum(                  [output_filters[i + 1 if i > 0 else i] for i in layers])              # extend表示添加一系列對象              routs.extend([l if l > 0 else l + i for l in layers])            elif mdef['type'] == 'shortcut':              # nn.Sequential() placeholder for 'shortcut' layer              filters = output_filters[int(mdef['from'])]              layer = int(mdef['from'])              routs.extend([i + layer if layer < 0 else layer])            elif mdef['type'] == 'yolo':              yolo_index += 1              mask = [int(x) for x in mdef['mask'].split(',')]  # anchor mask              modules = YOLOLayer(                  anchors=mdef['anchors'][mask],  # anchor list                  nc=int(mdef['classes']),  # number of classes                  img_size=img_size,  # (416, 416)                  yolo_index=yolo_index,  # 0, 1 or 2                  arc=arc)  # yolo architecture                # 這是在focal loss文章中提到的為卷積層添加bias              # 主要用於解決樣本不平衡問題              # (論文地址 https://arxiv.org/pdf/1708.02002.pdf section 3.3)              # 具體講解見下方              try:                  if arc == 'defaultpw' or arc == 'Fdefaultpw':                      # default with positive weights                      b = [-5.0, -5.0]  # obj, cls                  elif arc == 'default':                      # default no pw (40 cls, 80 obj)                      b = [-5.0, -5.0]                  elif arc == 'uBCE':                      # unified BCE (80 classes)                      b = [0, -9.0]                  elif arc == 'uCE':                      # unified CE (1 background + 80 classes)                      b = [10, -0.1]                  elif arc == 'Fdefault':                      # Focal default no pw (28 cls, 21 obj, no pw)                      b = [-2.1, -1.8]                  elif arc == 'uFBCE' or arc == 'uFBCEpw':                      # unified FocalBCE (5120 obj, 80 classes)                      b = [0, -6.5]                  elif arc == 'uFCE':                      # unified FocalCE (64 cls, 1 background + 80 classes)                      b = [7.7, -1.1]                    bias = module_list[-1][0].bias.view(len(mask), -1)                  # 255 to 3x85                  bias[:, 4] += b[0] - bias[:, 4].mean()  # obj                  bias[:, 5:] += b[1] - bias[:, 5:].mean()  # cls                    # 將新的偏移量賦值回模型中                  module_list[-1][0].bias = torch.nn.Parameter(bias.view(-1))                except:                  print('WARNING: smart bias initialization failure.')            else:              print('Warning: Unrecognized Layer Type: ' + mdef['type'])            # 將module內容保存在module_list中。          module_list.append(modules)          # 保存所有的filter個數          output_filters.append(filters)        return module_list, routs

bias部分講解

其中在YOLO Layer部分涉及到一個初始化的trick，來自Focal Loss中關於模型初始化的討論，具體內容請閱讀論文，https://arxiv.org/pdf/1708.02002.pdf 的第3.3節。

這裡涉及到一個非常insight的點，筆者與BBuf討論了很長時間，才理解這樣做的原因。

我們在第一篇中介紹了，YOLO層前一個卷積的filter個數計算公式如下：

5代表x,y,w,h, score，score代表該格子中是否存在目標，3代表這個格子中會分配3個anchor進行匹配。在YOLOLayer中的forward函數中，有以下程式碼，需要通過sigmoid激活函數：

if 'default' in self.arc:  # seperate obj and cls  	torch.sigmoid_(io[..., 4])  elif 'BCE' in self.arc:  # unified BCE (80 classes)  	torch.sigmoid_(io[..., 5:])  	io[..., 4] = 1  elif 'CE' in self.arc:  # unified CE (1 background + 80 classes)  	io[..., 4:] = F.softmax(io[..., 4:], dim=4)  	io[..., 4] = 1

可以觀察到，Sigmoid梯度是有限的，在[-5,5]之間。

而pytorch中的卷積層默認的初始化是以0為中心點的正態分布，這樣進行的初始化會導致很多gird中大約一半得到了激活，在計算loss的時候就會計算上所有的激活的點對應的坐標資訊，這樣計算loss就會變得很大。

根據這個現象，作者選擇在YOLOLayer的前一個卷積層添加bias，來避免這種情況，實際操作就是在原有的bias上減去5，這樣通過卷積得到的數值就不會被激活，可以防止在初始階段的第一個batch中就進行過擬合。通過以上操作，能夠讓所有的神經元在前幾個batch中輸出空的檢測。

經過作者的實驗，通過使用bias的trick，可以提升mAP、F1、P、R等指標，還能讓訓練過程更加平滑。

2.3 yolo_layers

程式碼如下：

def get_yolo_layers(model):      return [i for i, x in enumerate(model.module_defs) if x['type'] == 'yolo']      # [82, 94, 106] for yolov3

yolo layer的獲取是通過解析module_defs這個存儲cfg文件中的資訊的變數得到的。以yolov3.cfg為例，最終返回的是yolo層在整個module的序號。比如：第83,94,106個層是YOLO層。

3. forward函數

在YOLO中，如果能理解前向傳播的過程，那整個網路的構建也就很清楚明了了。

    def forward(self, x, var=None):          img_size = x.shape[-2:]          layer_outputs = []          output = []            for i, (mdef,                  module) in enumerate(zip(self.module_defs, self.module_list)):              mtype = mdef['type']              if mtype in ['convolutional', 'upsample', 'maxpool']:                  # 卷積層，上取樣，池化層只需要經過即可                  x = module(x)              elif mtype == 'route':                  # route操作就是將幾個層的內容拼接起來，具體可以看cfg文件解析                  layers = [int(x) for x in mdef['layers'].split(',')]                  if len(layers) == 1:                      x = layer_outputs[layers[0]]                  else:                      try:                          x = torch.cat([layer_outputs[i] for i in layers], 1)                      except:                          # apply stride 2 for darknet reorg layer                          layer_outputs[layers[1]] = F.interpolate(                              layer_outputs[layers[1]], scale_factor=[0.5, 0.5])                          x = torch.cat([layer_outputs[i] for i in layers], 1)                elif mtype == 'shortcut':                  x = x + layer_outputs[int(mdef['from'])]              elif mtype == 'yolo':                  output.append(module(x, img_size))              #記錄route對應的層              layer_outputs.append(x if i in self.routs else [])            if self.training:              # 如果訓練，直接輸出YOLO要求的Tensor              # 3*(class+5)              return output            elif ONNX_EXPORT:# 這個是對應的onnx導出的內容              x = [torch.cat(x, 0) for x in zip(*output)]              return x[0], torch.cat(x[1:3], 1)  # scores, boxes: 3780x80, 3780x4          else:              # 對應測試階段              io, p = list(zip(*output))  # inference output, training output              return torch.cat(io, 1), p

forward的過程也比較簡單，通過得到的module_defs和module_list變數，通過for循環將整個module_list中的內容進行一遍串聯，需要得到的最終結果是YOLO層的輸出。（ps：下一篇文章再進行YOLOLayer的程式碼解析）

參考資料

模型搭建：https://blog.csdn.net/happyday_d/article/details/85629119

參考資料：https://arxiv.org/pdf/1708.02002.pdf