零基礎入門深度學習(十一):目標檢測之YOLOv3演算法實現下篇
- 2020 年 2 月 19 日
- 筆記
01
導讀
本課程是百度官方開設的零基礎入門深度學習課程,主要面向沒有深度學習技術基礎或者基礎薄弱的同學,幫助大家在深度學習領域實現從0到1+的跨越。從本系列課程中,你將學習到:
- numpy實現神經網路構建和梯度下降演算法
- 深度學習基礎知識
- 電腦視覺領域主要方向的原理、實踐
- 自然語言處理領域主要方向的原理、實踐
- 個性化推薦演算法的原理、實踐
百度深度學習技術平台部資深研發工程師孫老師,在上一講中為大家講解了YOLOv3演算法中產生候選區域和卷積神經網路提取特徵的部分,本講將為大家介紹建立損失函數、多層級檢測和預測輸出的相關內容。
02
損失函數
上一講中,我們已經從概念上將輸出特徵圖上的像素點與預測框關聯起來了,那麼要對神經網路進行求解,還必須從數學上將網路輸出和預測框關聯起來,也就是要建立起損失函數跟網路輸出之間的關係。下面討論如何建立起YOLO-V3的損失函數。
對於每個預測框,YOLO-V3模型會建立三種類型的損失函數:
- 表徵是否包含目標物體的損失函數,通過pred_objectness和label_objectness計算
loss_obj = fluid.layers.sigmoid_cross_entropy_with_logits(pred_objectness, label_objectness)
- 表徵物體位置的損失函數,通過pred_location和label_location計算
pred_location_x = pred_location[:, :, 0, :, :] pred_location_y = pred_location[:, :, 1, :, :] pred_location_w = pred_location[:, :, 2, :, :] pred_location_h = pred_location[:, :, 3, :, :] loss_location_x = fluid.layers.sigmoid_cross_entropy_with_logits(pred_location_x, label_location_x) loss_location_y = fluid.layers.sigmoid_cross_entropy_with_logits(pred_location_y, label_location_y) loss_location_w = fluid.layers.abs(pred_location_w - label_location_w) loss_location_h = fluid.layers.abs(pred_location_h - label_location_h) loss_location = loss_location_x + loss_location_y + loss_location_w + loss_location_h
- 表徵物體類別的損失函數,通過pred_classification和label_classification計算
loss_obj = fluid.layers.sigmoid_cross_entropy_with_logits(pred_classification, label_classification)
在前面幾個小節中我們已經知道怎麼計算這些預測值和標籤了,但是遺留了一個小問題,就是沒有標註出哪些錨框的objectness為-1。為了完成這一步,我們需要計算出所有預測框跟真實框之間的IoU,然後把那些IoU大於閾值的真實框挑選出來。實現程式碼如下:
# 挑選出跟真實框IoU大於閾值的預測框 def get_iou_above_thresh_inds(pred_box, gt_boxes, iou_threshold): batchsize = pred_box.shape[0] num_rows = pred_box.shape[1] num_cols = pred_box.shape[2] num_anchors = pred_box.shape[3] ret_inds = np.zeros([batchsize, num_rows, num_cols, num_anchors]) for i in range(batchsize): pred_box_i = pred_box[i] gt_boxes_i = gt_boxes[i] for k in range(len(gt_boxes_i)): #gt in gt_boxes_i: gt = gt_boxes_i[k] gtx_min = gt[0] - gt[2] / 2. gty_min = gt[1] - gt[3] / 2. gtx_max = gt[0] + gt[2] / 2. gty_max = gt[1] + gt[3] / 2. if (gtx_max - gtx_min < 1e-3) or (gty_max - gty_min < 1e-3): continue x1 = np.maximum(pred_box_i[:, :, :, 0], gtx_min) y1 = np.maximum(pred_box_i[:, :, :, 1], gty_min) x2 = np.minimum(pred_box_i[:, :, :, 2], gtx_max) y2 = np.minimum(pred_box_i[:, :, :, 3], gty_max) intersection = np.maximum(x2 - x1, 0.) * np.maximum(y2 - y1, 0.) s1 = (gty_max - gty_min) * (gtx_max - gtx_min) s2 = (pred_box_i[:, :, :, 2] - pred_box_i[:, :, :, 0]) * (pred_box_i[:, :, :, 3] - pred_box_i[:, :, :, 1]) union = s2 + s1 - intersection iou = intersection / union above_inds = np.where(iou > iou_threshold) ret_inds[i][above_inds] = 1 ret_inds = np.transpose(ret_inds, (0,3,1,2)) return ret_inds.astype('bool')
上面的函數可以得到哪些錨框的objectness需要被標註為-1,通過下面的程式,對label_objectness進行處理,將IoU大於閾值,但又不是正樣本的那些錨框標註為-1。
def label_objectness_ignore(label_objectness, iou_above_thresh_indices): # 注意:這裡不能簡單的使用 label_objectness[iou_above_thresh_indices] = -1, # 這樣可能會造成label_objectness為1的那些點被設置為-1了 # 只有將那些被標註為0,且與真實框IoU超過閾值的預測框才被標註為-1 negative_indices = (label_objectness < 0.5) ignore_indices = negative_indices * iou_above_thresh_indices label_objectness[ignore_indices] = -1 return label_objectness
下面通過調用這兩個函數,實現如何將部分預測框的label_objectness設置為-1。
# 讀取數據 reader = multithread_loader('/home/aistudio/work/insects/train', batch_size=2, mode='train') img, gt_boxes, gt_labels, im_shape = next(reader()) # 計算出錨框對應的標籤 label_objectness, label_location, label_classification, scale_location = get_objectness_label(img, gt_boxes, gt_labels, iou_threshold = 0.7, anchors = [116, 90, 156, 198, 373, 326], num_classes=7, downsample=32) NUM_ANCHORS = 3 NUM_CLASSES = 7 num_filters=NUM_ANCHORS * (NUM_CLASSES + 5) with fluid.dygraph.guard(): backbone = DarkNet53_conv_body('yolov3_backbone', is_test=False) detection = YoloDetectionBlock('detection', channel=512, is_test=False) conv2d_pred = Conv2D('out_pred', num_filters=num_filters, filter_size=1) x = to_variable(img) C0, C1, C2 = backbone(x) route, tip = detection(C0) P0 = conv2d_pred(tip) # anchors包含了預先設定好的錨框尺寸 anchors = [116, 90, 156, 198, 373, 326] # downsample是特徵圖P0的步幅 pred_boxes = get_yolo_box_xxyy(P0.numpy(), anchors, num_classes=7, downsample=32) iou_above_thresh_indices = get_iou_above_thresh_inds(pred_boxes, gt_boxes, iou_threshold=0.7) label_objectness = label_objectness_ignore(label_objectness, iou_above_thresh_indices)
label_objectness.shape
(2, 3, 10, 10)
使用這種方式,就可以將那些沒有被標註為正樣本,但又與真實框IoU比較大的樣本objectness標籤設置為-1了,不計算其對任何一種損失函數的貢獻。
計算總的損失函數的程式碼如下:
def get_loss(output, label_objectness, label_location, label_classification, scales, num_anchors=3, num_classes=7): # 將output從[N, C, H, W]變形為[N, NUM_ANCHORS, NUM_CLASSES + 5, H, W] reshaped_output = fluid.layers.reshape(output, [-1, num_anchors, num_classes + 5, output.shape[2], output.shape[3]]) # 從output中取出跟objectness相關的預測值 pred_objectness = reshaped_output[:, :, 4, :, :] loss_objectness = fluid.layers.sigmoid_cross_entropy_with_logits(pred_objectness, label_objectness, ignore_index=-1) ## 對第1,2,3維求和 #loss_objectness = fluid.layers.reduce_sum(loss_objectness, dim=[1,2,3], keep_dim=False) # pos_samples 只有在正樣本的地方取值為1.,其它地方取值全為0. pos_objectness = label_objectness > 0 pos_samples = fluid.layers.cast(pos_objectness, 'float32') pos_samples.stop_gradient=True #從output中取出所有跟位置相關的預測值 tx = reshaped_output[:, :, 0, :, :] ty = reshaped_output[:, :, 1, :, :] tw = reshaped_output[:, :, 2, :, :] th = reshaped_output[:, :, 3, :, :] # 從label_location中取出各個位置坐標的標籤 dx_label = label_location[:, :, 0, :, :] dy_label = label_location[:, :, 1, :, :] tw_label = label_location[:, :, 2, :, :] th_label = label_location[:, :, 3, :, :] # 構建損失函數 loss_location_x = fluid.layers.sigmoid_cross_entropy_with_logits(tx, dx_label) loss_location_y = fluid.layers.sigmoid_cross_entropy_with_logits(ty, dy_label) loss_location_w = fluid.layers.abs(tw - tw_label) loss_location_h = fluid.layers.abs(th - th_label) # 計算總的位置損失函數 loss_location = loss_location_x + loss_location_y + loss_location_h + loss_location_w # 乘以scales loss_location = loss_location * scales # 只計算正樣本的位置損失函數 loss_location = loss_location * pos_samples #從ooutput取出所有跟物體類別相關的像素點 pred_classification = reshaped_output[:, :, 5:5+num_classes, :, :] # 計算分類相關的損失函數 loss_classification = fluid.layers.sigmoid_cross_entropy_with_logits(pred_classification, label_classification) # 將第2維求和 loss_classification = fluid.layers.reduce_sum(loss_classification, dim=2, keep_dim=False) # 只計算objectness為正的樣本的分類損失函數 loss_classification = loss_classification * pos_samples total_loss = loss_objectness + loss_location + loss_classification # 對所有預測框的loss進行求和 total_loss = fluid.layers.reduce_sum(total_loss, dim=[1,2,3], keep_dim=False) # 對所有樣本求平均 total_loss = fluid.layers.reduce_mean(total_loss) return total_loss
# 計算損失函數 # 讀取數據 reader = multithread_loader('/home/aistudio/work/insects/train', batch_size=2, mode='train') img, gt_boxes, gt_labels, im_shape = next(reader()) # 計算出錨框對應的標籤 label_objectness, label_location, label_classification, scale_location = get_objectness_label(img, gt_boxes, gt_labels, iou_threshold = 0.7, anchors = [116, 90, 156, 198, 373, 326], num_classes=7, downsample=32) NUM_ANCHORS = 3 NUM_CLASSES = 7 num_filters=NUM_ANCHORS * (NUM_CLASSES + 5) with fluid.dygraph.guard(): backbone = DarkNet53_conv_body('yolov3_backbone', is_test=False) detection = YoloDetectionBlock('detection', channel=512, is_test=False) conv2d_pred = Conv2D('out_pred', num_filters=num_filters, filter_size=1) x = to_variable(img) C0, C1, C2 = backbone(x) route, tip = detection(C0) P0 = conv2d_pred(tip) # anchors包含了預先設定好的錨框尺寸 anchors = [116, 90, 156, 198, 373, 326] # downsample是特徵圖P0的步幅 pred_boxes = get_yolo_box_xxyy(P0.numpy(), anchors, num_classes=7, downsample=32) iou_above_thresh_indices = get_iou_above_thresh_inds(pred_boxes, gt_boxes, iou_threshold=0.7) label_objectness = label_objectness_ignore(label_objectness, iou_above_thresh_indices) label_objectness = to_variable(label_objectness) label_location = to_variable(label_location) label_classification = to_variable(label_classification) scales = to_variable(scale_location) label_objectness.stop_gradient=True label_location.stop_gradient=True label_classification.stop_gradient=True scales.stop_gradient=True total_loss = get_loss(P0, label_objectness, label_location, label_classification, scales, num_anchors=NUM_ANCHORS, num_classes=NUM_CLASSES) total_loss_data = total_loss.numpy()
total_loss_data
array([623.6282], dtype=float32)
上面的程式計算出了總的損失函數,看到這裡,讀者已經了解到了YOLO-V3演算法的大部分內容,包括如何生成錨框、給錨框打上標籤、通過卷積神經網路提取特徵、將輸出特徵圖跟預測框相關聯、建立起損失函數。
03
多尺度檢測
目前我們計算損失函數是在特徵圖P0的基礎上進行的,它的步幅stride=32。特徵圖的尺寸比較小,像素點數目比較少,每個像素點的感受野很大,具有非常豐富的高層級語義資訊,可能比較容易檢測到較大的目標。為了能夠檢測到尺寸較小的那些目標,需要在尺寸較大的特徵圖上面建立預測輸出。如果我們在C2或者C1這種層級的特徵圖上直接產生預測輸出,可能面臨新的問題,它們沒有經過充分的特徵提取,像素點包含的語義資訊不夠豐富,有可能難以提取到有效的特徵模式。在目標檢測中,解決這一問題的方式是,將高層級的特徵圖尺寸放大之後跟低層級的特徵圖進行融合,得到的新特徵圖既能包含豐富的語義資訊,又具有較多的像素點,能夠描述更加精細的結構。
具體的網路實現方式如 圖19 所示:

圖19:生成多層級的輸出特徵圖P0、P1、P2
YOLO-V3在每個區域的中心位置產生3個錨框,在3個層級的特徵圖上產生錨框的大小分別為P2 [(10×13),(16×30),(33×23)],P1 [(30×61),(62×45),(59× 119)],P0[(116 × 90), (156 × 198), (373 × 326]。越往後的特徵圖上用到的錨框尺寸也越大,能捕捉到大尺寸目標的資訊;越往前的特徵圖上錨框尺寸越小,能捕捉到小尺寸目標的資訊。
因為有多尺度的檢測,所以需要對上面的程式碼進行較大的修改,而且實現過程也略顯繁瑣,所以推薦大家直接使用Paddle提供的API fluid.layers.yolov3_loss,其具體說明如下:
- fluid.layers.yolov3_loss(x, gt_box, gt_label, anchors, anchor_mask, class_num, ignore_thresh, downsample_ratio, gt_score=None, use_label_smooth=True, name=None))
- x: 輸入的影像數據
- gt_box: 真實框
- gt_label: 真實框標籤
- anchors: 使用到的anchor的尺寸,如[10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326]
- anchor_mask: 每個層級上使用的anchor的掩碼,[[6, 7, 8], [3, 4, 5], [0, 1, 2]]
- class_num,物體類別數,AI識蟲數據集為7
- ignore_thresh,預測框與真實框IoU閾值超過ignore_thresh時,不作為負樣本,YOLO-V3模型里設置為0.7
- downsample_ratio,特徵圖P0的下取樣比例,使用Darknet53骨幹網路時為32
- gt_score,真實框的置信度,在使用了mixup技巧時會用到
- use_label_smooth,一種訓練技巧,不使用就設置為False
- name,該層的名字,比如'yolov3_loss',可以不設置
對於使用了多層級特徵圖產生預測框的方法,其具體實現程式碼如下:
# 定義上取樣模組 class Upsample(fluid.dygraph.Layer): def __init__(self, name_scope, scale=2): super(Upsample,self).__init__(name_scope) self.scale = scale def forward(self, inputs): # get dynamic upsample output shape shape_nchw = fluid.layers.shape(inputs) shape_hw = fluid.layers.slice(shape_nchw, axes=[0], starts=[2], ends=[4]) shape_hw.stop_gradient = True in_shape = fluid.layers.cast(shape_hw, dtype='int32') out_shape = in_shape * self.scale out_shape.stop_gradient = True # reisze by actual_shape out = fluid.layers.resize_nearest( input=inputs, scale=self.scale, actual_shape=out_shape) return out # 定義YOLO-V3模型 class YOLOv3(fluid.dygraph.Layer): def __init__(self,name_scope, num_classes=7, is_train=True): super(YOLOv3,self).__init__(name_scope) self.is_train = is_train self.num_classes = num_classes # 提取影像特徵的骨幹程式碼 self.block = DarkNet53_conv_body(self.full_name(), is_test = not self.is_train) self.block_outputs = [] self.yolo_blocks = [] self.route_blocks_2 = [] # 生成3個層級的特徵圖P0, P1, P2 for i in range(3): # 添加從ci生成ri和ti的模組 yolo_block = self.add_sublayer( "yolo_detecton_block_%d" % (i), YoloDetectionBlock(self.full_name(), channel = 512//(2**i), is_test = not self.is_train)) self.yolo_blocks.append(yolo_block) num_filters = 3 * (self.num_classes + 5) # 添加從ti生成pi的模組,這是一個Conv2D操作,輸出通道數為3 * (num_classes + 5) block_out = self.add_sublayer( "block_out_%d" % (i), Conv2D(self.full_name(), num_filters=num_filters, filter_size=1, stride=1, padding=0, act=None, param_attr=ParamAttr( initializer=fluid.initializer.Normal(0., 0.02)), bias_attr=ParamAttr( initializer=fluid.initializer.Constant(0.0), regularizer=L2Decay(0.)))) self.block_outputs.append(block_out) if i < 2: # 對ri進行卷積 route = self.add_sublayer("route2_%d"%i, ConvBNLayer(self.full_name(), ch_out=256//(2**i), filter_size=1, stride=1, padding=0, is_test=(not self.is_train))) self.route_blocks_2.append(route) # 將ri放大以便跟c_{i+1}保持同樣的尺寸 self.upsample = Upsample(self.full_name()) def forward(self, inputs): outputs = [] blocks = self.block(inputs) for i, block in enumerate(blocks): if i > 0: # 將r_{i-1}經過卷積和上取樣之後得到特徵圖,與這一級的ci進行拼接 block = fluid.layers.concat(input=[route, block], axis=1) # 從ci生成ti和ri route, tip = self.yolo_blocks[i](block) # 從ti生成pi block_out = self.block_outputs[i](tip) # 將pi放入列表 outputs.append(block_out) if i < 2: # 對ri進行卷積調整通道數 route = self.route_blocks_2[i](route) # 對ri進行放大,使其尺寸和c_{i+1}保持一致 route = self.upsample(route) return outputs def get_loss(self, outputs, gtbox, gtlabel, gtscore=None, anchors = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326], anchor_masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]], ignore_thresh=0.7, use_label_smooth=False): """ 使用fluid.layers.yolov3_loss,直接計算損失函數,過程更簡潔,速度也更快 """ self.losses = [] downsample = 32 for i, out in enumerate(outputs): # 對三個層級分別求損失函數 anchor_mask_i = anchor_masks[i] loss = fluid.layers.yolov3_loss( x=out, # out是P0, P1, P2中的一個 gt_box=gtbox, # 真實框坐標 gt_label=gtlabel, # 真實框類別 gt_score=gtscore, # 真實框得分,使用mixup訓練技巧時需要,不使用該技巧時直接設置為1,形狀與gtlabel相同 anchors=anchors, # 錨框尺寸,包含[w0, h0, w1, h1, ..., w8, h8]共9個錨框的尺寸 anchor_mask=anchor_mask_i, # 篩選錨框的mask,例如anchor_mask_i=[3, 4, 5],將anchors中第3、4、5個錨框挑選出來給該層級使用 class_num=self.num_classes, # 分類類別數 ignore_thresh=ignore_thresh, # 當預測框與真實框IoU > ignore_thresh,標註objectness = -1 downsample_ratio=downsample, # 特徵圖相對於原圖縮小的倍數,例如P0是32, P1是16,P2是8 use_label_smooth=False) # 使用label_smooth訓練技巧時會用到,這裡沒用此技巧,直接設置為False self.losses.append(fluid.layers.reduce_mean(loss)) #reduce_mean對每張圖片求和 downsample = downsample // 2 # 下一級特徵圖的縮放倍數會減半 return sum(self.losses) # 對每個層級求和
開啟端到端訓練
訓練過程的流程如下圖所示,輸入圖片經過特徵提取得到三個層級的輸出特徵圖P0(stride=32)、P1(stride=16)和P2(stride=8),相應的分別使用不同大小的小方塊區域去生成對應的錨框和預測框,並對這些錨框進行標註。
- P0層級特徵圖,對應著使用大小的小方塊,在每個區域中心生成大小分別為, , 的三種錨框。
- P1層級特徵圖,對應著使用大小的小方塊,在每個區域中心生成大小分別為, , 的三種錨框。
- P2層級特徵圖,對應著使用大小的小方塊,在每個區域中心生成大小分別為, , 的三種錨框。
將三個層級的特徵圖與對應錨框之間的標籤關聯起來,並建立損失函數,總的損失函數等於三個層級的損失函數相加。通過極小化損失函數,可以開啟端到端的訓練過程。

圖20:端到端訓練流程
訓練過程的具體實現程式碼如下:
############# 這段程式碼在本地機器上運行請慎重,容易造成死機####################### import time import os import paddle import paddle.fluid as fluid ANCHORS = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326] ANCHOR_MASKS = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] IGNORE_THRESH = .7 NUM_CLASSES = 7 def get_lr(base_lr = 0.0001, lr_decay = 0.1): bd = [10000, 20000] lr = [base_lr, base_lr * lr_decay, base_lr * lr_decay * lr_decay] learning_rate = fluid.layers.piecewise_decay(boundaries=bd, values=lr) return learning_rate if __name__ == '__main__': TRAINDIR = '/home/aistudio/work/insects/train' TESTDIR = '/home/aistudio/work/insects/test' VALIDDIR = '/home/aistudio/work/insects/val' with fluid.dygraph.guard(): model = YOLOv3('yolov3', num_classes = NUM_CLASSES, is_train=True) #創建模型 learning_rate = get_lr() opt = fluid.optimizer.Momentum( learning_rate=learning_rate, momentum=0.9, regularization=fluid.regularizer.L2Decay(0.0005)) #創建優化器 train_loader = multithread_loader(TRAINDIR, batch_size= 10, mode='train') #創建訓練數據讀取器 valid_loader = multithread_loader(VALIDDIR, batch_size= 10, mode='valid') #創建驗證數據讀取器 MAX_EPOCH = 200 for epoch in range(MAX_EPOCH): for i, data in enumerate(train_loader()): img, gt_boxes, gt_labels, img_scale = data gt_scores = np.ones(gt_labels.shape).astype('float32') gt_scores = to_variable(gt_scores) img = to_variable(img) gt_boxes = to_variable(gt_boxes) gt_labels = to_variable(gt_labels) outputs = model(img) #前向傳播,輸出[P0, P1, P2] loss = model.get_loss(outputs, gt_boxes, gt_labels, gtscore=gt_scores, anchors = ANCHORS, anchor_masks = ANCHOR_MASKS, ignore_thresh=IGNORE_THRESH, use_label_smooth=False) # 計算損失函數 loss.backward() # 反向傳播計算梯度 opt.minimize(loss) # 更新參數 model.clear_gradients() if i % 1 == 0: timestring = time.strftime("%Y-%m-%d %H:%M:%S",time.localtime(time.time())) print('{}[TRAIN]epoch {}, iter {}, output loss: {}'.format(timestring, epoch, i, loss.numpy())) # save params of model if (epoch % 5 == 0) or (epoch == MAX_EPOCH -1): fluid.save_dygraph(model.state_dict(), 'yolo_epoch{}'.format(epoch)) # 每個epoch結束之後在驗證集上進行測試 model.eval() for i, data in enumerate(valid_loader()): img, gt_boxes, gt_labels, img_scale = data gt_scores = np.ones(gt_labels.shape).astype('float32') gt_scores = to_variable(gt_scores) img = to_variable(img) gt_boxes = to_variable(gt_boxes) gt_labels = to_variable(gt_labels) outputs = model(img) loss = model.get_loss(outputs, gt_boxes, gt_labels, gtscore=gt_scores, anchors = ANCHORS, anchor_masks = ANCHOR_MASKS, ignore_thresh=IGNORE_THRESH, use_label_smooth=False) if i % 1 == 0: timestring = time.strftime("%Y-%m-%d %H:%M:%S",time.localtime(time.time())) print('{}[VALID]epoch {}, iter {}, output loss: {}'.format(timestring, epoch, i, loss.numpy())) model.train()
04
預測
預測過程流程 圖21 如下所示:

圖21:端到端訓練流程
預測過程可以分為兩步:
- 通過網路輸出計算出預測框位置和所屬類別的得分。
- 使用非極大值抑制來消除重疊較大的預測框。
對於第1步,前面我們已經講過如何通過網路輸出值計算pred_objectness_probability, pred_boxes以及pred_classification_probability,這裡推薦大家直接使用fluid.layers.yolo_box,其使用方法是:
- fluid.layers.yolo_box(x, img_size, anchors, class_num, conf_thresh, downsample_ratio, name=None)
- x,網路輸出特徵圖,例如上面提到的P0或者P1、P2
- img_size,輸入圖片尺寸
- anchors,使用到的anchor的尺寸,如[10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326]
- anchor_mask: 每個層級上使用的anchor的掩碼,[[6, 7, 8], [3, 4, 5], [0, 1, 2]]
- class_num,物體類別數目
- conf_thresh, 置信度閾值,得分低於該閾值的預測框位置數值不用計算直接設置為0.0
- downsample_ratio, 特徵圖的下取樣比例,例如P0是32,P1是16,P2是8
- name=None,名字,例如'yolo_box'
- 返回值包括兩項,boxes和scores,其中boxes是所有預測框的坐標值,scores是所有預測框的得分。
預測框得分的定義是所屬類別的概率乘以其預測框是否包含目標物體的objectness概率,即
在上面定義的類YOLO-V3下面添加函數,get_pred,通過調用fluid.layers.yolo_box獲得P0、P1、P2三個層級的特徵圖對應的預測框和得分,並將他們拼接在一塊,即可得到所有的預測框及其屬於各個類別的得分。
class YOLOv3(fluid.dygraph.Layer): def __init__(self,name_scope, num_classes=7, is_train=True): super(YOLOv3,self).__init__(name_scope) self.is_train = is_train self.num_classes = num_classes # 提取影像特徵的骨幹程式碼 self.block = DarkNet53_conv_body(self.full_name(), is_test = not self.is_train) self.block_outputs = [] self.yolo_blocks = [] self.route_blocks_2 = [] for i in range(3): # 添加從ci生成ri和ti的模組 yolo_block = self.add_sublayer( "yolo_detecton_block_%d" % (i), YoloDetectionBlock(self.full_name(), channel = 512//(2**i), is_test = not self.is_train)) self.yolo_blocks.append(yolo_block) num_filters = 3 * (self.num_classes + 5) # 添加從ti生成pi的模組,這是一個Conv2D操作,輸出通道數為3 * (num_classes + 5) block_out = self.add_sublayer( "block_out_%d" % (i), Conv2D(self.full_name(), num_filters=num_filters, filter_size=1, stride=1, padding=0, act=None, param_attr=ParamAttr( initializer=fluid.initializer.Normal(0., 0.02)), bias_attr=ParamAttr( initializer=fluid.initializer.Constant(0.0), regularizer=L2Decay(0.)))) self.block_outputs.append(block_out) if i < 2: # 對ri進行卷積 route = self.add_sublayer("route2_%d"%i, ConvBNLayer(self.full_name(), ch_out=256//(2**i), filter_size=1, stride=1, padding=0, is_test=(not self.is_train))) self.route_blocks_2.append(route) # 將ri放大以便跟c_{i+1}保持同樣的尺寸 self.upsample = Upsample(self.full_name()) def forward(self, inputs): outputs = [] blocks = self.block(inputs) for i, block in enumerate(blocks): if i > 0: # 將r_{i-1}經過卷積和上取樣之後得到特徵圖,與這一級的ci進行拼接 block = fluid.layers.concat(input=[route, block], axis=1) # 從ci生成ti和ri route, tip = self.yolo_blocks[i](block) # 從ti生成pi block_out = self.block_outputs[i](tip) # 將pi放入列表 outputs.append(block_out) if i < 2: # 對ri進行卷積調整通道數 route = self.route_blocks_2[i](route) # 對ri進行放大,使其尺寸和c_{i+1}保持一致 route = self.upsample(route) return outputs def get_loss(self, outputs, gtbox, gtlabel, gtscore=None, anchors = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326], anchor_masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]], ignore_thresh=0.7, use_label_smooth=False): self.losses = [] downsample = 32 for i, out in enumerate(outputs): anchor_mask_i = anchor_masks[i] loss = fluid.layers.yolov3_loss( x=out, gt_box=gtbox, gt_label=gtlabel, gt_score=gtscore, anchors=anchors, anchor_mask=anchor_mask_i, class_num=self.num_classes, ignore_thresh=ignore_thresh, downsample_ratio=downsample, use_label_smooth=False) self.losses.append(fluid.layers.reduce_mean(loss)) downsample = downsample // 2 return sum(self.losses) def get_pred(self, outputs, im_shape=None, anchors = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326], anchor_masks = [[6, 7, 8], [3, 4, 5], [0, 1, 2]], valid_thresh = 0.01): downsample = 32 total_boxes = [] total_scores = [] for i, out in enumerate(outputs): anchor_mask = anchor_masks[i] anchors_this_level = [] for m in anchor_mask: anchors_this_level.append(anchors[2 * m]) anchors_this_level.append(anchors[2 * m + 1]) boxes, scores = fluid.layers.yolo_box( x=out, img_size=im_shape, anchors=anchors_this_level, class_num=self.num_classes, conf_thresh=valid_thresh, downsample_ratio=downsample, name="yolo_box" + str(i)) total_boxes.append(boxes) total_scores.append( fluid.layers.transpose( scores, perm=[0, 2, 1])) downsample = downsample // 2 yolo_boxes = fluid.layers.concat(total_boxes, axis=1) yolo_scores = fluid.layers.concat(total_scores, axis=2) return yolo_boxes, yolo_scores
第1步的計算結果會在每個小方塊區域都會產生多個預測框,輸出預測框中會有很多重合度比較大,需要消除重疊較大的冗餘預測框。
下面示例程式碼中的預測框是使用模型對圖片預測之後輸出的,這裡一共選出了11個預測框,在圖上畫出預測框如下所示。在每個人像周圍,都出現了多個預測框,需要消除冗餘的預測框以得到最終的預測結果。
# 畫圖展示目標物體邊界框 import numpy as np import matplotlib.pyplot as plt import matplotlib.patches as patches from matplotlib.image import imread import math # 定義畫矩形框的程式 def draw_rectangle(currentAxis, bbox, edgecolor = 'k', facecolor = 'y', fill=False, linestyle='-'): # currentAxis,坐標軸,通過plt.gca()獲取 # bbox,邊界框,包含四個數值的list, [x1, y1, x2, y2] # edgecolor,邊框線條顏色 # facecolor,填充顏色 # fill, 是否填充 # linestype,邊框線型 # patches.Rectangle需要傳入左上角坐標、矩形區域的寬度、高度等參數 rect=patches.Rectangle((bbox[0], bbox[1]), bbox[2]-bbox[0]+1, bbox[3]-bbox[1]+1, linewidth=1, edgecolor=edgecolor,facecolor=facecolor,fill=fill, linestyle=linestyle) currentAxis.add_patch(rect) plt.figure(figsize=(10, 10)) filename = '/home/aistudio/work/images/section3/000000086956.jpg' im = imread(filename) plt.imshow(im) currentAxis=plt.gca() # 預測框位置 boxes = np.array([[4.21716537e+01, 1.28230896e+02, 2.26547668e+02, 6.00434631e+02], [3.18562988e+02, 1.23168472e+02, 4.79000000e+02, 6.05688416e+02], [2.62704697e+01, 1.39430557e+02, 2.20587097e+02, 6.38959656e+02], [4.24965363e+01, 1.42706665e+02, 2.25955185e+02, 6.35671204e+02], [2.37462646e+02, 1.35731537e+02, 4.79000000e+02, 6.31451294e+02], [3.19390472e+02, 1.29295090e+02, 4.79000000e+02, 6.33003845e+02], [3.28933838e+02, 1.22736115e+02, 4.79000000e+02, 6.39000000e+02], [4.44292603e+01, 1.70438187e+02, 2.26841858e+02, 6.39000000e+02], [2.17988785e+02, 3.02472412e+02, 4.06062927e+02, 6.29106628e+02], [2.00241089e+02, 3.23755096e+02, 3.96929321e+02, 6.36386108e+02], [2.14310303e+02, 3.23443665e+02, 4.06732849e+02, 6.35775269e+02]]) # 預測框得分 scores = np.array([0.5247661 , 0.51759845, 0.86075854, 0.9910175 , 0.39170712, 0.9297706 , 0.5115228 , 0.270992 , 0.19087596, 0.64201415, 0.879036]) # 畫出所有預測框 for box in boxes: draw_rectangle(currentAxis, box)

這裡使用非極大值抑制(non-maximum suppression, nms)來消除冗餘框,其基本思想是,如果有多個預測框都對應同一個物體,則只選出得分最高的那個預測框,剩下的預測框被丟棄掉。那麼如何判斷兩個預測框對應的是同一個物體呢,標準該怎麼設置?如果兩個預測框的類別一樣,而且他們的位置重合度比較大,則可以認為他們是在預測同一個目標。非極大值抑制的做法是,選出某個類別得分最高的預測框,然後看哪些預測框跟它的IoU大於閾值,就把這些預測框給丟棄掉。這裡IoU的閾值是超參數,需要提前設置,YOLO-V3模型裡面設置的是0.5。
比如在上面的程式中,boxes裡面一共對應11個預測框,scores給出了它們預測"人"這一類別的得分。
- Step0 創建選中列表,keep_list = []
- Step1 對得分進行排序,remain_list = [ 3, 5, 10, 2, 9, 0, 1, 6, 4, 7, 8],
- Step2 選出boxes[3],此時keep_list為空,不需要計算IoU,直接將其放入keep_list,keep_list = [3], remain_list=[5, 10, 2, 9, 0, 1, 6, 4, 7, 8]
- Step3 選出boxes[5],此時keep_list中已經存在boxes[3],計算出IoU(boxes[3], boxes[5]) = 0.0,顯然小於閾值,則keep_list=[3, 5], remain_list = [10, 2, 9, 0, 1, 6, 4, 7, 8]
- Step4 選出boxes[10],此時keep_list=[3, 5],計算IoU(boxes[3], boxes[10])=0.0268,IoU(boxes[5], boxes[10])=0.0268 = 0.24,都小於閾值,則keep_list = [3, 5, 10],remain_list=[2, 9, 0, 1, 6, 4, 7, 8]
- Step5 選出boxes[2],此時keep_list = [3, 5, 10],計算IoU(boxes[3], boxes[2]) = 0.88,超過了閾值,直接將boxes[2]丟棄,keep_list=[3, 5, 10],remain_list=[9, 0, 1, 6, 4, 7, 8]
- Step6 選出boxes[9],此時keep_list = [3, 5, 10],計算IoU(boxes[3], boxes[9]) = 0.0577,IoU(boxes[5], boxes[9]) = 0.205,IoU(boxes[10], boxes[9]) = 0.88,超過了閾值,將boxes[9]丟棄掉。keep_list=[3, 5, 10],remain_list=[0, 1, 6, 4, 7, 8]
- Step7 重複上述Step6直到remain_list為空
最終得到keep_list=[3, 5, 10],也就是預測框3、5、10被最終挑選出來了,如下圖所示
# 畫圖展示目標物體邊界框 import numpy as np import matplotlib.pyplot as plt import matplotlib.patches as patches from matplotlib.image import imread import math # 定義畫矩形框的程式 def draw_rectangle(currentAxis, bbox, edgecolor = 'k', facecolor = 'y', fill=False, linestyle='-'): # currentAxis,坐標軸,通過plt.gca()獲取 # bbox,邊界框,包含四個數值的list, [x1, y1, x2, y2] # edgecolor,邊框線條顏色 # facecolor,填充顏色 # fill, 是否填充 # linestype,邊框線型 # patches.Rectangle需要傳入左上角坐標、矩形區域的寬度、高度等參數 rect=patches.Rectangle((bbox[0], bbox[1]), bbox[2]-bbox[0]+1, bbox[3]-bbox[1]+1, linewidth=1, edgecolor=edgecolor,facecolor=facecolor,fill=fill, linestyle=linestyle) currentAxis.add_patch(rect) plt.figure(figsize=(10, 10)) filename = '/home/aistudio/work/images/section3/000000086956.jpg' im = imread(filename) plt.imshow(im) currentAxis=plt.gca() boxes = np.array([[4.21716537e+01, 1.28230896e+02, 2.26547668e+02, 6.00434631e+02], [3.18562988e+02, 1.23168472e+02, 4.79000000e+02, 6.05688416e+02], [2.62704697e+01, 1.39430557e+02, 2.20587097e+02, 6.38959656e+02], [4.24965363e+01, 1.42706665e+02, 2.25955185e+02, 6.35671204e+02], [2.37462646e+02, 1.35731537e+02, 4.79000000e+02, 6.31451294e+02], [3.19390472e+02, 1.29295090e+02, 4.79000000e+02, 6.33003845e+02], [3.28933838e+02, 1.22736115e+02, 4.79000000e+02, 6.39000000e+02], [4.44292603e+01, 1.70438187e+02, 2.26841858e+02, 6.39000000e+02], [2.17988785e+02, 3.02472412e+02, 4.06062927e+02, 6.29106628e+02], [2.00241089e+02, 3.23755096e+02, 3.96929321e+02, 6.36386108e+02], [2.14310303e+02, 3.23443665e+02, 4.06732849e+02, 6.35775269e+02]]) scores = np.array([0.5247661 , 0.51759845, 0.86075854, 0.9910175 , 0.39170712, 0.9297706 , 0.5115228 , 0.270992 , 0.19087596, 0.64201415, 0.879036]) left_ind = np.where((boxes[:, 0]<60) * (boxes[:, 0]>20)) left_boxes = boxes[left_ind] left_scores = scores[left_ind] colors = ['r', 'g', 'b', 'k'] # 畫出最終保留的預測框 inds = [3, 5, 10] for i in range(3): box = boxes[inds[i]] draw_rectangle(currentAxis, box, edgecolor=colors[i])

非極大值抑制的具體實現程式碼如下面nms函數的定義,需要說明的是數據集中含有多個類別的物體,所以這裡需要做多分類非極大值抑制,其實現原理與非極大值抑制相同,區別在於需要對每個類別都做非極大值抑制,實現程式碼如下面的multiclass_nms所示。
# 非極大值抑制 def nms(bboxes, scores, score_thresh, nms_thresh, pre_nms_topk, i=0, c=0): """ nms """ inds = np.argsort(scores) inds = inds[::-1] keep_inds = [] while(len(inds) > 0): cur_ind = inds[0] cur_score = scores[cur_ind] # if score of the box is less than score_thresh, just drop it if cur_score < score_thresh: break keep = True for ind in keep_inds: current_box = bboxes[cur_ind] remain_box = bboxes[ind] iou = box_iou_xyxy(current_box, remain_box) if iou > nms_thresh: keep = False break if i == 0 and c == 4 and cur_ind == 951: print('suppressed, ', keep, i, c, cur_ind, ind, iou) if keep: keep_inds.append(cur_ind) inds = inds[1:] return np.array(keep_inds) # 多分類非極大值抑制 def multiclass_nms(bboxes, scores, score_thresh=0.01, nms_thresh=0.45, pre_nms_topk=1000, pos_nms_topk=100): """ This is for multiclass_nms """ batch_size = bboxes.shape[0] class_num = scores.shape[1] rets = [] for i in range(batch_size): bboxes_i = bboxes[i] scores_i = scores[i] ret = [] for c in range(class_num): scores_i_c = scores_i[c] keep_inds = nms(bboxes_i, scores_i_c, score_thresh, nms_thresh, pre_nms_topk, i=i, c=c) if len(keep_inds) < 1: continue keep_bboxes = bboxes_i[keep_inds] keep_scores = scores_i_c[keep_inds] keep_results = np.zeros([keep_scores.shape[0], 6]) keep_results[:, 0] = c keep_results[:, 1] = keep_scores[:] keep_results[:, 2:6] = keep_bboxes[:, :] ret.append(keep_results) if len(ret) < 1: rets.append(ret) continue ret_i = np.concatenate(ret, axis=0) scores_i = ret_i[:, 1] if len(scores_i) > pos_nms_topk: inds = np.argsort(scores_i)[::-1] inds = inds[:pos_nms_topk] ret_i = ret_i[inds] rets.append(ret_i) return rets
下面是完整的測試程式,在測試數據集上的輸出結果將會被保存在pred_results.json文件中。
import json ANCHORS = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326] ANCHOR_MASKS = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] VALID_THRESH = 0.01 NMS_TOPK = 400 NMS_POSK = 100 NMS_THRESH = 0.45 NUM_CLASSES = 7 if __name__ == '__main__': TRAINDIR = '/home/aistudio/work/insects/train/images' TESTDIR = '/home/aistudio/work/insects/test/images' VALIDDIR = '/home/aistudio/work/insects/val' with fluid.dygraph.guard(): model = YOLOv3('yolov3', num_classes=NUM_CLASSES, is_train=False) params_file_path = '/home/aistudio/work/yolo_epoch50' model_state_dict, _ = fluid.load_dygraph(params_file_path) model.load_dict(model_state_dict) model.eval() total_results = [] test_loader = test_data_loader(TESTDIR, batch_size= 1, mode='test') for i, data in enumerate(test_loader()): img_name, img_data, img_scale_data = data img = to_variable(img_data) img_scale = to_variable(img_scale_data) outputs = model.forward(img) bboxes, scores = model.get_pred(outputs, im_shape=img_scale, anchors=ANCHORS, anchor_masks=ANCHOR_MASKS, valid_thresh = VALID_THRESH) bboxes_data = bboxes.numpy() scores_data = scores.numpy() result = multiclass_nms(bboxes_data, scores_data, score_thresh=VALID_THRESH, nms_thresh=NMS_THRESH, pre_nms_topk=NMS_TOPK, pos_nms_topk=NMS_POSK) for j in range(len(result)): result_j = result[j] img_name_j = img_name[j] total_results.append([img_name_j, result_j.tolist()]) print('processed {} pictures'.format(len(total_results))) print('') json.dump(total_results, open('pred_results.json', 'w'))
json文件中保存著測試結果,是包含所有圖片預測結果的list,其構成如下:
[[img_name, [[label, score, x1, x2, y1, y2], ..., [label, score, x1, x2, y1, y2]]], [img_name, [[label, score, x1, x2, y1, y2], ..., [label, score, x1, x2, y1, y2]]], ... [img_name, [[label, score, x1, x2, y1, y2],..., [label, score, x1, x2, y1, y2]]]]
list中的每一個元素是一張圖片的預測結果,list的總長度等於圖片的數目,每張圖片預測結果的格式是:
[img_name, [[label, score, x1, x2, y1, y2],..., [label, score, x1, x2, y1, y2]]]
其中第一個元素是圖片名稱image_name,第二個元素是包含該圖片所有預測框的list, 預測框列表:
[[label, score, x1, x2, y1, y2],..., [label, score, x1, x2, y1, y2]]
預測框列表中每個元素[label, score, x1, x2, y1, y2]描述了一個預測框,label是預測框所屬類別標籤,score是預測框的得分;x1, x2, y1, y2對應預測框左上角坐標(x1, y1),右下角坐標(x2, y2)。每張圖片可能有很多個預測框,則將其全部放在預測框列表中。
在AI識蟲比賽的基礎版本中,老師提供了MAP指標計算程式碼,使用此pred_results.json文件即可計算出最終的評估指標。
模型效果及可視化展示
上面的程式展示了如何讀取測試數據集的讀片,並將最終結果保存在json格式的文件中。為了更直觀的給讀者展示模型效果,下面的程式添加了如何讀取單張圖片,並畫出其產生的預測框。
- 創建數據讀取器以讀取單張圖片的數據
# 讀取單張測試圖片 def single_image_data_loader(filename, test_image_size=608, mode='test'): """ 載入測試用的圖片,測試數據沒有groundtruth標籤 """ batch_size= 1 def reader(): batch_data = [] img_size = test_image_size file_path = os.path.join(filename) img = cv2.imread(file_path) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) H = img.shape[0] W = img.shape[1] img = cv2.resize(img, (img_size, img_size)) mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] mean = np.array(mean).reshape((1, 1, -1)) std = np.array(std).reshape((1, 1, -1)) out_img = (img / 255.0 - mean) / std out_img = out_img.astype('float32').transpose((2, 0, 1)) img = out_img #np.transpose(out_img, (2,0,1)) im_shape = [H, W] batch_data.append((image_name.split('.')[0], img, im_shape)) if len(batch_data) == batch_size: yield make_test_array(batch_data) batch_data = [] return reader
- 定義繪製預測框的畫圖函數,程式碼如下。
# 定義畫圖函數 INSECT_NAMES = ['Boerner', 'Leconte', 'Linnaeus', 'acuminatus', 'armandi', 'coleoptera', 'linnaeus'] # 定義畫矩形框的函數 def draw_rectangle(currentAxis, bbox, edgecolor = 'k', facecolor = 'y', fill=False, linestyle='-'): # currentAxis,坐標軸,通過plt.gca()獲取 # bbox,邊界框,包含四個數值的list, [x1, y1, x2, y2] # edgecolor,邊框線條顏色 # facecolor,填充顏色 # fill, 是否填充 # linestype,邊框線型 # patches.Rectangle需要傳入左上角坐標、矩形區域的寬度、高度等參數 rect=patches.Rectangle((bbox[0], bbox[1]), bbox[2]-bbox[0]+1, bbox[3]-bbox[1]+1, linewidth=1, edgecolor=edgecolor,facecolor=facecolor,fill=fill, linestyle=linestyle) currentAxis.add_patch(rect) # 定義繪製預測結果的函數 def draw_results(result, filename, draw_thresh=0.5): plt.figure(figsize=(10, 10)) im = imread(filename) plt.imshow(im) currentAxis=plt.gca() colors = ['r', 'g', 'b', 'k', 'y', 'c', 'purple'] for item in result: box = item[2:6] label = int(item[0]) name = INSECT_NAMES[label] if item[1] > draw_thresh: draw_rectangle(currentAxis, box, edgecolor = colors[label]) plt.text(box[0], box[1], name, fontsize=12, color=colors[label])
- 使用上面定義的single_image_data_loader函數讀取指定的圖片,輸入網路並計算出預測框和得分,然後使用多分類非極大值抑制消除冗餘的框。將最終結果畫圖展示出來。
import json import paddle import paddle.fluid as fluid ANCHORS = [10, 13, 16, 30, 33, 23, 30, 61, 62, 45, 59, 119, 116, 90, 156, 198, 373, 326] ANCHOR_MASKS = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] VALID_THRESH = 0.01 NMS_TOPK = 400 NMS_POSK = 100 NMS_THRESH = 0.45 NUM_CLASSES = 7 if __name__ == '__main__': image_name = '/home/aistudio/work/insects/test/images/2599.jpeg' params_file_path = '/home/aistudio/work/yolo_epoch50' with fluid.dygraph.guard(): model = YOLOv3('yolov3', num_classes=NUM_CLASSES, is_train=False) model_state_dict, _ = fluid.load_dygraph(params_file_path) model.load_dict(model_state_dict) model.eval() total_results = [] test_loader = single_image_data_loader(image_name, mode='test') for i, data in enumerate(test_loader()): img_name, img_data, img_scale_data = data img = to_variable(img_data) img_scale = to_variable(img_scale_data) outputs = model.forward(img) bboxes, scores = model.get_pred(outputs, im_shape=img_scale, anchors=ANCHORS, anchor_masks=ANCHOR_MASKS, valid_thresh = VALID_THRESH) bboxes_data = bboxes.numpy() scores_data = scores.numpy() results = multiclass_nms(bboxes_data, scores_data, score_thresh=VALID_THRESH, nms_thresh=NMS_THRESH, pre_nms_topk=NMS_TOPK, pos_nms_topk=NMS_POSK) result = results[0] draw_results(result, image_name, draw_thresh=0.5)

通過上面的程式,清晰的給讀者展示了如何使用訓練好的權重,對圖片進行預測並將結果可視化。最終輸出的圖片上,檢測出了每個昆蟲,標出了它們的邊界框和具體類別。
05
總結
在過去的四講中,孫老師為讀者詳細講解了YOLOv3的設計思想以及具體演算法實現,並且以業病蟲害數據集為例完成了一個具體的AI識蟲的任務。在後期課程中,將繼續為大家帶來內容更豐富的課程,幫助學員快速掌握深度學習方法。