『深度應用』一小時教你上手MaskRCNN·Keras開源實戰（Windows&Linux）

2019 年 10 月 3 日
筆記

0. 前言介紹

開源地址：https://github.com/matterport/Mask_RCNN

個人主頁：http://www.yansongsong.cn/

MaskRCNN是何凱明基於以往的faster rcnn架構提出的新的卷積網路，一舉完成了object instance segmentation. 該方法在有效地目標的同時完成了高品質的語義分割。文章的主要思路就是把原有的Faster-RCNN進行擴展，添加一個分支使用現有的檢測對目標進行並行預測。

此開源程式碼：這是在Python 3，Keras和TensorFlow上實現Mask R-CNN。該模型為影像中對象的每個實例生成邊界框和分割蒙版。它基於特徵金字塔網路（FPN）和ResNet101骨幹網。

存儲庫包括：

Mask R-CNN的源程式碼，建立在FPN和ResNet101之上。
MS COCO的培訓程式碼
MS COCO的預訓練重量
Jupyter筆記型電腦可以在每一步都可視化檢測管道
ParallelModel類用於多GPU培訓
評估MS COCO指標（AP）
您自己的數據集培訓示例

程式碼記錄在案，設計易於擴展。如果您在研究中使用它，請考慮引用此存儲庫（下面的bibtex）。如果您從事3D視覺，您可能會發現我們最近發布的Matterport3D數據集也很有用。該數據集是由我們的客戶捕獲的3D重建空間創建的，這些客戶同意將其公開供學術使用。您可以在此處查看更多示例。

1. MaskRCNN環境搭建

首先在項目源碼地址下載源碼到本機中：https://github.com/matterport/Mask_RCNN

1.1 要求

Python 3.4，TensorFlow 1.3，Keras 2.0.8和其他常見軟體包requirements.txt。

親測Python版本為3.6也可以，建議3.4及以上。

Python安裝建議使用mini conda 安裝和管理環境

TensorFlow，Keras也建議直接使用 conda install tensorflow keras

1.2 MS COCO要求：

要在MS COCO上進行訓練或測試，還需要：

pycocotools（下面的安裝說明）
MS COCO數據集
下載5K 迷你和35K 驗證 – 減去迷你的子集。最初的快速R-CNN實現中的更多細節。

如果您使用Docker，則已驗證程式碼可以在此Docker容器上運行。

為什麼需要安裝pycocotools，經過看源碼發現，訓練coco數據集時用到了pycocotools這個模組，如果不安裝會報錯無法正常運行。

1.3 安裝

克隆此存儲庫：https://github.com/matterport/Mask_RCNN
安裝依賴項（CD 進入項目根目錄，pip3 不行的話可以嘗試用 pip）
```
pip3 install -r requirements.txt
```
在linux安裝時，使用此方法一切正常，就是速度會有些慢，因為安裝內容較多。
使用Windows安裝時可能會遇到shapely，無法安裝的情況，解決方法如下：
```
conda install shapely -y
```
從存儲庫根目錄運行安裝程式
```
python3 setup.py install
```
```
不報錯的話就安裝完成了，如果報錯可以根據錯誤提示，網路搜索解決。
python3 不行的話就用 python，還要注意一點你使用哪個python環境安裝，後面運行的時候也要用此python環境運行MaskRCNN。
```
從發布頁面下載預先訓練的COCO權重（mask_rcnn_coco.h5）。

這裡提供一個下載地址，可以直接下載使用：https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5
（可選）pycocotools從這些回購中的一個訓練或測試MS COCO安裝。（這裡就是1.2 MS COCO要求，需要安裝pycocotools）
- ~~Linux：https：//github.com/waleedka/coco~~
- ~~Windows：https：//github.com/philferriere/cocoapi。您必須在路徑上安裝Visual C ++ 2015構建工具（有關其他詳細資訊，請參閱存儲庫）~~
  
  經過本人安裝測試，可以使用較為簡單的方式來安裝：
  Linux中直接使用：
```
pip3 install pycocotools
```
  windows 中需要先安裝 Visual C++ 2015，下載地址：https://go.microsoft.com/fwlink/?LinkId=691126
  然後執行：注意要和安裝MaskRCNN同一Python環境
```
pip3 install git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI
```

上述都執行完成的話，keras版本的MaskRCNN就安裝完成了。下面我們動手試用一下。

2. 使用演示

用安裝Mask RCNN的python環境打開 jupyter notebook，命令行，或shell運行：

jupyter notebook

指定jupyter notebook默認路徑，便於打開項目工程可以參考這個部落格：https://www.cnblogs.com/awakenedy/p/9075712.html

運行完成後，會自動打開一個網頁，如果不能就手動複製一下地址打開。

進入下載的MaskRCNN的根目錄，打開 samples/demo.ipynb 文件。

程式碼如下：

Mask R-CNN Demo

A quick intro to using the pre-trained model to detect and segment objects.

In [1]:導入相關文件，設置參數，下載網路模型等：由於下載速度慢，建議直接下載https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5到根目錄在運行下面程式碼

import os  import sys  import random  import math  import numpy as np  import skimage.io  import matplotlib  import matplotlib.pyplot as plt    # Root directory of the project  ROOT_DIR = os.path.abspath("../")    # Import Mask RCNN  sys.path.append(ROOT_DIR)  # To find local version of the library  from mrcnn import utils  import mrcnn.model as modellib  from mrcnn import visualize  # Import COCO config  sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # To find local version  import coco    %matplotlib inline    # Directory to save logs and trained model  MODEL_DIR = os.path.join(ROOT_DIR, "logs")    # Local path to trained weights file  COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")  # Download COCO trained weights from Releases if needed  if not os.path.exists(COCO_MODEL_PATH):      utils.download_trained_weights(COCO_MODEL_PATH)    # Directory of images to run detection on  IMAGE_DIR = os.path.join(ROOT_DIR, "images")

Using TensorFlow backend.

Configurations

We’ll be using a model trained on the MS-COCO dataset. The configurations of this model are in the CocoConfig class in coco.py.

For inferencing, modify the configurations a bit to fit the task. To do so, sub-class the CocoConfig class and override the attributes you need to change.

In [2]:進行一些參數設置

class InferenceConfig(coco.CocoConfig):      # Set batch size to 1 since we'll be running inference on      # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU      GPU_COUNT = 1      IMAGES_PER_GPU = 1    config = InferenceConfig()  config.display()

Configurations:  BACKBONE                       resnet101  BACKBONE_STRIDES               [4, 8, 16, 32, 64]  BATCH_SIZE                     1  BBOX_STD_DEV                   [0.1 0.1 0.2 0.2]  COMPUTE_BACKBONE_SHAPE         None  DETECTION_MAX_INSTANCES        100  DETECTION_MIN_CONFIDENCE       0.7  DETECTION_NMS_THRESHOLD        0.3  FPN_CLASSIF_FC_LAYERS_SIZE     1024  GPU_COUNT                      1  GRADIENT_CLIP_NORM             5.0  IMAGES_PER_GPU                 1  IMAGE_CHANNEL_COUNT            3  IMAGE_MAX_DIM                  1024  IMAGE_META_SIZE                93  IMAGE_MIN_DIM                  800  IMAGE_MIN_SCALE                0  IMAGE_RESIZE_MODE              square  IMAGE_SHAPE                    [1024 1024    3]  LEARNING_MOMENTUM              0.9  LEARNING_RATE                  0.001  LOSS_WEIGHTS                   {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0}  MASK_POOL_SIZE                 14  MASK_SHAPE                     [28, 28]  MAX_GT_INSTANCES               100  MEAN_PIXEL                     [123.7 116.8 103.9]  MINI_MASK_SHAPE                (56, 56)  NAME                           coco  NUM_CLASSES                    81  POOL_SIZE                      7  POST_NMS_ROIS_INFERENCE        1000  POST_NMS_ROIS_TRAINING         2000  PRE_NMS_LIMIT                  6000  ROI_POSITIVE_RATIO             0.33  RPN_ANCHOR_RATIOS              [0.5, 1, 2]  RPN_ANCHOR_SCALES              (32, 64, 128, 256, 512)  RPN_ANCHOR_STRIDE              1  RPN_BBOX_STD_DEV               [0.1 0.1 0.2 0.2]  RPN_NMS_THRESHOLD              0.7  RPN_TRAIN_ANCHORS_PER_IMAGE    256  STEPS_PER_EPOCH                1000  TOP_DOWN_PYRAMID_SIZE          256  TRAIN_BN                       False  TRAIN_ROIS_PER_IMAGE           200  USE_MINI_MASK                  True  USE_RPN_ROIS                   True  VALIDATION_STEPS               50  WEIGHT_DECAY                   0.0001

Create Model and Load Trained Weights

In [3]:建立網路模型，載入參數

# Create model object in inference mode.  model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)    # Load weights trained on MS-COCO  model.load_weights(COCO_MODEL_PATH, by_name=True)

WARNING:tensorflow:From c:datasappsrjminiconda3envstf_gpulibsite-packagestensorflowpythonframeworkop_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.  Instructions for updating:  Colocations handled automatically by placer.  WARNING:tensorflow:From c:datasappsrjminiconda3envstf_gpulibsite-packagesmask_rcnn-2.1-py3.6.eggmrcnnmodel.py:772: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.  Instructions for updating:  Use tf.cast instead.

Class Names

The model classifies objects and returns class IDs, which are integer value that identify each class. Some datasets assign integer values to their classes and some don’t. For example, in the MS-COCO dataset, the ‘person’ class is 1 and ‘teddy bear’ is 88. The IDs are often sequential, but not always. The COCO dataset, for example, has classes associated with class IDs 70 and 72, but not 71.

To improve consistency, and to support training on data from multiple sources at the same time, our Dataset class assigns it’s own sequential integer IDs to each class. For example, if you load the COCO dataset using our Dataset class, the ‘person’ class would get class ID = 1 (just like COCO) and the ‘teddy bear’ class is 78 (different from COCO). Keep that in mind when mapping class IDs to class names.

To get the list of class names, you’d load the dataset and then use the class_names property like this.

# Load COCO dataset  dataset = coco.CocoDataset()  dataset.load_coco(COCO_DIR, "train")  dataset.prepare()    # Print class names  print(dataset.class_names)

We don’t want to require you to download the COCO dataset just to run this demo, so we’re including the list of class names below. The index of the class name in the list represent its ID (first class is 0, second is 1, third is 2, …etc.)

In [4]:配置類別名

# COCO Class names  # Index of the class in the list is its ID. For example, to get ID of  # the teddy bear class, use: class_names.index('teddy bear')  class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',                 'bus', 'train', 'truck', 'boat', 'traffic light',                 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',                 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',                 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',                 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',                 'kite', 'baseball bat', 'baseball glove', 'skateboard',                 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',                 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',                 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',                 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',                 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',                 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',                 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',                 'teddy bear', 'hair drier', 'toothbrush']

Run Object Detection

In [5]:讀入照片進行識別，原文中採用從images文件夾隨機讀取的方式。我這裡注釋掉了前兩句，採用讀取自己準備的照片，這裡是我的母校照片。
大家只需要將image_file改為自己準備照片地址即可。

# Load a random image from the images folder  #file_names = next(os.walk(IMAGE_DIR))[2]  #image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))    image_file = os.path.join(IMAGE_DIR, "ahnu.jpg")    image = skimage.io.imread(image_file)    # Run detection  results = model.detect([image], verbose=1)    # Visualize results  r = results[0]  visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],                              class_names, r['scores'])

Processing 1 images  image                    shape: (768, 1024, 3)        min:    0.00000  max:  255.00000  uint8  molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  151.10000  float64  image_metas              shape: (1, 93)               min:    0.00000  max: 1024.00000  float64  anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32

3. 訓練模型

由於訓練模型我正在準備中，還沒有開始訓練，這裡先貼上官方的指南，後期我訓練完成也會及時更新。如果與什麼問題也歡迎評論私信我。

3.1 MS COCO培訓

我們為MS COCO提供預先訓練的砝碼，使其更容易入手。您可以使用這些權重作為起點來訓練您自己在網路上的變化。培訓和評估程式碼在samples/coco/coco.py。您可以在Jupyter筆記型電腦中導入此模組（請參閱提供的筆記型電腦中的示例），或者您可以直接從命令行運行它：

# Train a new model starting from pre-trained COCO weights  python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=coco    # Train a new model starting from ImageNet weights  python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=imagenet    # Continue training a model that you had trained earlier  python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5    # Continue training the last model you trained. This will find  # the last trained weights in the model directory.  python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=last

您還可以使用以下命令運行COCO評估程式碼：

# Run COCO evaluation on the last trained model  python3 samples/coco/coco.py evaluate --dataset=/path/to/coco/ --model=last

應設置培訓計劃，學習率和其他參數samples/coco/coco.py。

3.2 對您自己的數據集進行培訓

首先閱讀關於氣球顏色飛濺樣本的部落格文章。它涵蓋了從注釋影像到培訓再到在示例應用程式中使用結果的過程。

總之，要在您自己的數據集上訓練模型，您需要擴展兩個類：

Config 該類包含默認配置。對其進行子類化並修改您需要更改的屬性。

Dataset 此類提供了一種使用任何數據集的一致方法。它允許您使用新數據集進行培訓，而無需更改模型的程式碼。它還支援同時載入多個數據集，如果要檢測的對象在一個數據集中並非全部可用，則此選項非常有用。

見例子samples/shapes/train_shapes.ipynb，samples/coco/coco.py，samples/balloon/balloon.py，和samples/nucleus/nucleus.py。

3.3 與官方文件的不同之處

這個實現大部分都遵循Mask RCNN文章，但在一些情況下我們偏向於程式碼簡單性和泛化。這些是我們意識到的一些差異。如果您遇到其他差異，請告訴我們。

影像大小調整：為了支援每批訓練多個影像，我們將所有影像調整為相同大小。例如，MS COCO上的1024x1024px。我們保留縱橫比，因此如果影像不是正方形，我們用零填充它。在論文中，調整大小使得最小邊為800px，最大邊為1000px。
邊界框：一些數據集提供邊界框，一些僅提供蒙版。為了支援對多個數據集的訓練，我們選擇忽略數據集附帶的邊界框，而是動態生成它們。我們選擇封裝掩碼所有像素的最小盒子作為邊界框。這簡化了實現，並且還使得應用影像增強變得容易，否則影像增強將更難以應用於邊界框，例如影像旋轉。

為了驗證這種方法，我們將計算出的邊界框與COCO數據集提供的邊界框進行了比較。我們發現~2％的邊界框相差1px或更多，~0.05％相差5px或更多，僅0.01％相差10px或更多。
學習率：本文使用0.02的學習率，但我們發現它太高，並且經常導致重量爆炸，特別是當使用小批量時。這可能與Caffe和TensorFlow如何計算梯度（總和與批次和GPU之間的平均值之間的差異）有關。或者，也許官方模型使用漸變剪輯來避免這個問題。我們使用漸變剪輯，但不要過於激進。我們發現較小的學習率無論如何都會更快收斂，所以我們繼續這樣做。

4. 總結

花了數個小時完成了這個上手教程，希望能對MaskRCNN感興趣朋友提供幫助。

如果覺得有用的話，歡迎點贊收藏，也歡迎翻閱我之前部落格。

往期優秀博文：