Tensorflow邊框檢測入門【Bouding Box Regression 】

2019 年 10 月 4 日
筆記

要學習目標檢測演算法嗎？任何一個ML學習者都希望能夠給影像中的目標物體圈個漂亮的框框，在這篇文章中我們將學習目標檢測中的一個基本概念：邊框回歸/Bounding Box Regression。邊框回歸併不複雜，但是即使像YOLO這樣頂尖的目標檢測器也使用了這一技術！

我們將使用Tensorflow的Keras API實現一個邊框回歸模型。現在開始吧！如果你可以訪問Google Colab的話，可以訪問這裡。

1、準備數據集

我們將使用Kaggle.com上的這個影像定位數據集，它包含了3類（黃瓜、茄子和蘑菇）共373個已經標註了目標邊框的影像文件。我們的目標是解析影像並進行歸一化處理，同時從XML格式的標註文件中解析得到目標物體包圍框的4個頂點的坐標：

如果你希望創建自己的標註數據集也沒有問題！你可以使用LabelImage。利用LabelImage你可以快速標註目標物體的包圍邊框，然後保存為PASCAL-VOC格式：

2、數據處理

首先我們需要處理一下影像。使用glob包，我們可以列出後綴為jpg的文件，逐個處理：

input_dim = 228    from PIL import Image , ImageDraw  import os  import glob    images = []  image_paths = glob.glob( 'training_images/*.jpg' )  for imagefile in image_paths:      image = Image.open( imagefile ).resize( ( input_dim , input_dim ))      image = np.asarray( image ) / 255.0      images.append( image )

接下來我們需要處理XML標註。標註文件的格式為PASCAL-VOC。我們使用xmltodict包將XML文件轉換為Python的字典對象：

import xmltodict  import os    bboxes = []  classes_raw = []  annotations_paths = glob.glob( 'training_images/*.xml' )  for xmlfile in annotations_paths:      x = xmltodict.parse( open( xmlfile , 'rb' ) )      bndbox = x[ 'annotation' ][ 'object' ][ 'bndbox' ]      bndbox = np.array([ int(bndbox[ 'xmin' ]) , int(bndbox[ 'ymin' ]) , int(bndbox[ 'xmax' ]) , int(bndbox[ 'ymax' ]) ])      bndbox2 = [ None ] * 4      bndbox2[0] = bndbox[0]      bndbox2[1] = bndbox[1]      bndbox2[2] = bndbox[2]      bndbox2[3] = bndbox[3]      bndbox2 = np.array( bndbox2 ) / input_dim      bboxes.append( bndbox2 )      classes_raw.append( x[ 'annotation' ][ 'object' ][ 'name' ] )

現在我們準備訓練集和測試集：

from sklearn.preprocessing import LabelBinarizer  from sklearn.model_selection import train_test_split    boxes = np.array( bboxes )  encoder = LabelBinarizer()  classes_onehot = encoder.fit_transform( classes_raw )    Y = np.concatenate( [ boxes , classes_onehot ] , axis=1 )  X = np.array( images )    x_train, x_test, y_train, y_test = train_test_split( X, Y, test_size=0.1 )

3、創建Keras模型

我們首先為模型定義一個損失函數和一個衡量指標。損失函數同時使用了平方差（MSE：Mean Squared Error）和交並比（IoU：Intersection over Union），指標則用來衡量模型的準確性同時輸出IoU得分：

IoU計算兩個邊框的交集與並集的比率：

Python實現程式碼如下：

input_shape = ( input_dim , input_dim , 3 )  dropout_rate = 0.5  alpha = 0.2    def calculate_iou( target_boxes , pred_boxes ):      xA = K.maximum( target_boxes[ ... , 0], pred_boxes[ ... , 0] )      yA = K.maximum( target_boxes[ ... , 1], pred_boxes[ ... , 1] )      xB = K.minimum( target_boxes[ ... , 2], pred_boxes[ ... , 2] )      yB = K.minimum( target_boxes[ ... , 3], pred_boxes[ ... , 3] )      interArea = K.maximum( 0.0 , xB - xA ) * K.maximum( 0.0 , yB - yA )      boxAArea = (target_boxes[ ... , 2] - target_boxes[ ... , 0]) * (target_boxes[ ... , 3] - target_boxes[ ... , 1])      boxBArea = (pred_boxes[ ... , 2] - pred_boxes[ ... , 0]) * (pred_boxes[ ... , 3] - pred_boxes[ ... , 1])      iou = interArea / ( boxAArea + boxBArea - interArea )      return iou    def custom_loss( y_true , y_pred ):      mse = tf.losses.mean_squared_error( y_true , y_pred )      iou = calculate_iou( y_true , y_pred )      return mse + ( 1 - iou )    def iou_metric( y_true , y_pred ):      return calculate_iou( y_true , y_pred )

接下來我們創建CNN模型。我們堆疊幾個Conv2D層並拉平其輸出，然後送入後邊的全連接層。為了避免過擬合，我們在全連接層使用Dropout，並使用LeakyReLU激活層：

num_classes = 3  pred_vector_length = 4 + num_classes    model_layers = [  	keras.layers.Conv2D(16, kernel_size=(3, 3), strides=1, input_shape=input_shape),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.Conv2D(16, kernel_size=(3, 3), strides=1 ),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ),        keras.layers.Conv2D(32, kernel_size=(3, 3), strides=1),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.Conv2D(32, kernel_size=(3, 3), strides=1),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ),        keras.layers.Conv2D(64, kernel_size=(3, 3), strides=1),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.Conv2D(64, kernel_size=(3, 3), strides=1),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ),        keras.layers.Conv2D(128, kernel_size=(3, 3), strides=1),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.Conv2D(128, kernel_size=(3, 3), strides=1),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ),        keras.layers.Conv2D(256, kernel_size=(3, 3), strides=1),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.Conv2D(256, kernel_size=(3, 3), strides=1),      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.MaxPooling2D( pool_size=( 2 , 2 ) ),        keras.layers.Flatten() ,        keras.layers.Dense( 1240 ) ,      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.Dense( 640 ) ,      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.Dense( 480 ) ,      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.Dense( 120 ) ,      keras.layers.LeakyReLU( alpha=alpha ) ,      keras.layers.Dense( 62 ) ,      keras.layers.LeakyReLU( alpha=alpha ) ,        keras.layers.Dense( pred_vector_length ),      keras.layers.LeakyReLU( alpha=alpha ) ,  ]    model = keras.Sequential( model_layers )  model.compile(  	optimizer=keras.optimizers.Adam( lr=0.0001 ),  	loss=custom_loss,      metrics=[ iou_metric ]  )

4、訓練模型

現在可以開始訓練了：

model.fit(      x_train ,      y_train ,      validation_data=( x_test , y_test ),      epochs=100 ,      batch_size=3  )model.save( 'model.h5')

5、在影像上繪製邊框

現在我們的模型已經訓練好了，可以用它來檢測一些測試影像並繪製檢測出的對象的邊框，然後把結果影像保存下來。

!mkdir -v inference_images    boxes = model.predict( x_test )  for i in range( boxes.shape[0] ):      b = boxes[ i , 0 : 4 ] * input_dim      img = x_test[i] * 255      source_img = Image.fromarray( img.astype( np.uint8 ) , 'RGB' )      draw = ImageDraw.Draw( source_img )      draw.rectangle( b , outline="black" )      source_img.save( 'inference_images/image_{}.png'.format( i + 1 ) , 'png' )

下面是檢測結果圖示例：

要決定測試集上的IOU得分，同時計算分類準確率，我們使用如下的程式碼：

    xA = np.maximum( target_boxes[ ... , 0], pred_boxes[ ... , 0] )      yA = np.maximum( target_boxes[ ... , 1], pred_boxes[ ... , 1] )      xB = np.minimum( target_boxes[ ... , 2], pred_boxes[ ... , 2] )      yB = np.minimum( target_boxes[ ... , 3], pred_boxes[ ... , 3] )      interArea = np.maximum(0.0, xB - xA ) * np.maximum(0.0, yB - yA )      boxAArea = (target_boxes[ ... , 2] - target_boxes[ ... , 0]) * (target_boxes[ ... , 3] - target_boxes[ ... , 1])      boxBArea = (pred_boxes[ ... , 2] - pred_boxes[ ... , 0]) * (pred_boxes[ ... , 3] - pred_boxes[ ... , 1])      iou = interArea / ( boxAArea + boxBArea - interArea )      return iou    def class_accuracy( target_classes , pred_classes ):      target_classes = np.argmax( target_classes , axis=1 )      pred_classes = np.argmax( pred_classes , axis=1 )      return ( target_classes == pred_classes ).mean()    target_boxes = y_test * input_dim  pred = model.predict( x_test )  pred_boxes = pred[ ... , 0 : 4 ] * input_dim  pred_classes = pred[ ... , 4 : ]    iou_scores = calculate_avg_iou( target_boxes , pred_boxes )  print( 'Mean IOU score {}'.format( iou_scores.mean() ) )    print( 'Class Accuracy is {} %'.format( class_accuracy( y_test[ ... , 4 : ] , pred_classes ) * 100 ))

原文鏈接：Tensorflow目標檢測之邊框回歸入門 — 匯智網