手把手教你用深度學習做物體檢測(四)：模型使用

2019 年 10 月 3 日
筆記

上一篇《手把手教你用深度學習做物體檢測(三)：模型訓練》中介紹了如何使用yolov3訓練我們自己的物體檢測模型，本篇文章將重點介紹如何使用我們訓練好的模型來檢測圖片或影片中的物體。

如果你看過了上一篇文章，那麼就知道我們用的是 AlexeyAB/darknet項目，該項目雖然提供了物體檢測的方法，分別是基於c++和python編寫的物體檢測程式碼，但是有幾個問題如下：

都不支援中文顯示。
都沒有顯示置信度。
程式檢測框樣式都不夠友好。
python編寫的物體檢測程式碼執行總是報類型相關錯誤，估計是底層c++程式的問題。

其中，中文顯示亂碼的問題和opencv有關，網上也有很多文章有所介紹，但是都十分繁瑣，所以我基於python，借鑒 qqwweee/keras-yolo3項目的程式碼，重新寫了一套物體檢測程式，主要思想是用python的PIL庫代替opencv來繪製檢測資訊到影像上，當然還有其它一些細節改動，就不一一說明了，直接上程式碼：
darknet.py文件主要是修改了detect_image方法

def detect_image(class_names, net, meta, im, thresh=.5, hier_thresh=.5, nms=.45, debug=False):      num = c_int(0)      if debug: print("Assigned num")      pnum = pointer(num)      if debug: print("Assigned pnum")      predict_image(net, im)      if debug: print("did prediction")        dets = get_network_boxes(net, im.w, im.h, thresh, hier_thresh, None, 0, pnum, 0)      if debug: print("Got dets")      num = pnum[0]      if debug: print("got zeroth index of pnum")      if nms:          do_nms_sort(dets, num, meta.classes, nms)      if debug: print("did sort")      res = []      if debug: print("about to range")      for j in range(num):          if debug: print("Ranging on " + str(j) + " of " + str(num))          if debug: print("Classes: " + str(meta), meta.classes, meta.names)          for i in range(meta.classes):              if debug: print("Class-ranging on " + str(i) + " of " + str(meta.classes) + "= " + str(dets[j].prob[i]))              if dets[j].prob[i] > 0.0:                  b = dets[j].bbox                  if altNames is None:         # nameTag = meta.names[i] 該步驟會導致段錯誤，初步判斷應該是和c++程式有關，所以直接傳入類別列表參數，以繞過該問題。                      nameTag = class_names[i]                      print(nameTag)                  else:                      nameTag = altNames[i]                      print(nameTag)                  if debug:                      print("Got bbox", b)                      print(nameTag)                      print(dets[j].prob[i])                      print((b.x, b.y, b.w, b.h))                  res.append((nameTag, dets[j].prob[i], (b.x, b.y, b.w, b.h)))      if debug: print("did range")      res = sorted(res, key=lambda x: -x[1])      if debug: print("did sort")      free_detections(dets, num)      if debug: print("freed detections")      return res

添加darknet_video_custom.py，內容如下

# -*- coding: utf-8 -*-  """  本模組使用yolov3模型探測目標在圖片或影片中的位置  """  __author__ = '程式設計師一一滌生'import colorsys  import os  from timeit import default_timer as timer  import cv2  import numpy as np  from PIL import ImageDraw, ImageFont, Image  import darknet    def _convertBack(x, y, w, h):      xmin = int(round(x - (w / 2)))      xmax = int(round(x + (w / 2)))      ymin = int(round(y - (h / 2)))      ymax = int(round(y + (h / 2)))      return xmin, ymin, xmax, ymax    def letterbox_image(image, size):      '''resize image with unchanged aspect ratio using padding'''      iw, ih = image.size      w, h = size      scale = min(w / iw, h / ih)      nw = int(iw * scale)      nh = int(ih * scale)      image = image.resize((nw, nh), Image.BICUBIC)      new_image = Image.new('RGB', size, (128, 128, 128))      new_image.paste(image, ((w - nw) // 2, (h - nh) // 2))      return new_image    class YOLO(object):      _defaults = {          "configPath": "names-data/yolo-obj.cfg",          "weightPath": "names-data/backup/yolo-obj_3000.weights",          "metaPath": "names-data/voc.data",          "classes_path": "names-data/voc.names",          "thresh": 0.3,          "iou_thresh": 0.5,          # "model_image_size": (416, 416),          # "model_image_size": (608, 608),          "model_image_size": (800, 800),          "gpu_num": 1,      }        def __init__(self, **kwargs):          self.__dict__.update(self._defaults)  # set up default values          self.__dict__.update(kwargs)  # and update with user overrides          self.class_names = self._get_class()          self.colors = self._get_colors()          self.netMain = darknet.load_net_custom(self.configPath.encode("ascii"), self.weightPath.encode("ascii"), 0,                                                 1)  # batch size = 1          self.metaMain = darknet.load_meta(self.metaPath.encode("ascii"))          self.altNames = self._get_alt_names()        def _get_class(self):          classes_path = os.path.expanduser(self.classes_path)          with open(classes_path, encoding="utf-8") as f:              class_names = f.readlines()          class_names = [c.strip() for c in class_names]          return class_names        def _get_colors(self):          class_names = self._get_class()          # Generate colors for drawing bounding boxes.          hsv_tuples = [(x / len(class_names), 1., 1.)                        for x in range(len(class_names))]          colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))          colors = list(              map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))          np.random.seed(10101)  # Fixed seed for consistent colors across runs.          np.random.shuffle(colors)  # Shuffle colors to decorrelate adjacent classes.          np.random.seed(None)  # Reset seed to default.          return colors        def _get_alt_names(self):          try:              with open(self.metaPath) as metaFH:                  metaContents = metaFH.read()                  import re                  match = re.search("names *= *(.*)$", metaContents, re.IGNORECASE | re.MULTILINE)                  if match:                      result = match.group(1)                  else:                      result = None                  try:                      if os.path.exists(result):                          with open(result) as namesFH:                              namesList = namesFH.read().strip().split("n")                              altNames = [x.strip() for x in namesList]                  except TypeError:                      pass          except Exception:              pass          return altNames        def cvDrawBoxes(self, detections, image):          # 字體相關設置，包括字體文件路徑、字體大小          font = ImageFont.truetype(font='font/simfang.ttf',                                    size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))          # 檢測框的邊框厚度，該公式使得厚度可以根據圖片的大小來自動調整          thickness = (image.size[0] + image.size[1]) // 300  #          # 遍歷每個檢測到的目標detection:(classname,probaility,(x,y,w,h))          for c, detection in enumerate(detections):              # 獲取當前目標的類別和置信度分數              classname = detection[0]              # score = round(detection[1] * 100, 2)              score = round(detection[1], 2)              label = '{} {:.2f}'.format(classname, score)              # 計算檢測框左上角(xmin, ymin)和右下角的坐標(xmax, ymax)              x, y, w, h = detection[2][0],                            detection[2][1],                            detection[2][2],                            detection[2][3]              xmin, ymin, xmax, ymax = _convertBack(                  float(x), float(y), float(w), float(h))              # 獲取繪製實例              draw = ImageDraw.Draw(image)              # 獲取將顯示的文本的大小              label_size = draw.textsize(label, font)              # 將坐標對應到top, left, bottom, right，注意不要對應錯了              top, left, bottom, right = (ymin, xmin, ymax, xmax)              top = max(0, np.floor(top + 0.5).astype('int32'))              left = max(0, np.floor(left + 0.5).astype('int32'))              bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))              right = min(image.size[0], np.floor(right + 0.5).astype('int32'))              print(label, (left, top), (right, bottom))              if top - label_size[1] >= 0:                  text_origin = np.array([left, top - label_size[1]])              else:                  text_origin = np.array([left, top + 1])              if c > len(self.class_names) - 1:                  c = 1              # 繪製邊框厚度              for i in range(thickness):                  draw.rectangle(                      [left + i, top + i, right - i, bottom - i],                      outline=self.colors[c])              # 繪製檢測框的文本邊界              draw.rectangle(                  [tuple(text_origin), tuple(text_origin + label_size)],                  fill=self.colors[c])              # 繪製文本              draw.text(text_origin, label, fill=(0, 0, 0), font=font)              del draw          return image        def detect_video(self, video_path, output_path="",show=True):          nw = self.model_image_size[0]          nh = self.model_image_size[1]          assert nw % 32 == 0, 'Multiples of 32 required'          assert nh % 32 == 0, 'Multiples of 32 required'          vid = cv2.VideoCapture(video_path)          if not vid.isOpened():              raise IOError("Couldn't open webcam or video")          video_FourCC = cv2.VideoWriter_fourcc(*"mp4v")          video_fps = vid.get(cv2.CAP_PROP_FPS)          video_size = (nw,nh)          isOutput = True if output_path != "" else False          if isOutput:              print("!!! TYPE:", type(output_path), type(video_FourCC), type(video_fps), type(video_size))              out = cv2.VideoWriter(output_path, video_FourCC, video_fps, video_size)          accum_time = 0          curr_fps = 0          fps = "FPS: ??"          prev_time = timer()            # Create an image we reuse for each detect          darknet_image = darknet.make_image(nw, nh, 3)          while True:              return_value, frame = vid.read()              if return_value:                  # 轉成RGB格式，因為opencv默認使用BGR格式讀取圖片，而PIL是用RGB                  frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)                  image = Image.fromarray(frame_rgb)                  image_resized = image.resize(video_size, Image.LINEAR)                  darknet.copy_image_from_bytes(darknet_image, np.asarray(image_resized).tobytes())                  detections = darknet.detect_image(self.class_names, self.netMain, self.metaMain, darknet_image,                                                    thresh=self.thresh, debug=True)                  image_resized = self.cvDrawBoxes(detections, image_resized)                  result = np.asarray(image_resized)                  # 轉成BGR格式以便opencv處理                  result = cv2.cvtColor(result, cv2.COLOR_RGB2BGR)                  curr_time = timer()                  exec_time = curr_time - prev_time                  prev_time = curr_time                  accum_time = accum_time + exec_time                  curr_fps = curr_fps + 1                  if accum_time > 1:                      accum_time = accum_time - 1                      fps = "FPS: " + str(curr_fps)                      curr_fps = 0                  cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX,                              fontScale=0.50, color=(255, 0, 0), thickness=2)                  if show:                      cv2.imshow("Object Detect", result)                  if isOutput:                      print("start write...==========================================")                      out.write(result)                  if cv2.waitKey(1) & 0xFF == ord('q'):                      break              else:                  break          out.release()          vid.release()          cv2.destroyAllWindows()        def detect_image(self, image_path, save_path):          nw = self.model_image_size[0]          nh = self.model_image_size[1]          assert nw % 32 == 0, 'Multiples of 32 required'          assert nh % 32 == 0, 'Multiples of 32 required'          try:              image = Image.open(image_path)          except:              print('Open Error! Try again!')          else:              image_resized = image.resize((nw, nh), Image.LINEAR)              darknet_image = darknet.make_image(nw, nh, 3)              darknet.copy_image_from_bytes(darknet_image, np.asarray(image_resized).tobytes())              # 識別圖片得到目標的類別、置信度、中心點坐標和檢測框的高寬              detections = darknet.detect_image(self.class_names, self.netMain, self.metaMain, darknet_image,                                                thresh=0.25, debug=True)              # 在圖片上將detections資訊繪製出來              image_resized = self.cvDrawBoxes(detections, image_resized)              # 顯示繪製後的圖片              image_resized.show()              image_resized.save(save_path)    if __name__ == "__main__":      _yolo = YOLO()      _yolo.detect_image("names-data/images/food.JPG", "names-data/images/food_detect.JPG")      # _yolo.detect_video("names-data/videos/food.mp4", "names-data/videos/food_detect.mp4",show=False)

上面的程式碼的關鍵部分都附有相關的注釋，這裡就不一一解讀了，另外附上中文字體文件，放到項目的font目錄下即可。

下載鏈接： https://github.com/Halfish/lstm-ctc-ocr/blob/master/fonts/simfang.ttf

下面是我收藏的一些其他字體，你可以挑選自己喜歡的字體使用。

鏈接：https://pan.baidu.com/s/1PWS7Hw1z3dkDyq7feZxqEQ
提取碼：xu8q

下面看看如何顯示置信度，打開src/images.c文件，將draw_detections_cv_v3函數用如下程式碼替換，注意替換後要重新make一下項目：

void draw_detections_cv_v3(IplImage* show_img, detection *dets, int num, float thresh, char **names, image **alphabet, int classes, int ext_output){      int i, j;      if (!show_img) return;      static int frame_id = 0;      frame_id++;      for (i = 0; i < num; ++i) {          char labelstr[4096] = { 0 };          int class_id = -1;          for (j = 0; j < classes; ++j) {              int show = strncmp(names[j], "dont_show", 9);              if (dets[i].prob[j] > thresh && show) {                  float score=dets[i].prob[j];//在label標籤上加入置信度                  if (class_id < 0) {                      strcat(labelstr, names[j]);                      strcat(labelstr, ", ");                      sprintf(labelstr + strlen(labelstr), "%0.2f", score);                      class_id = j;                  }                  else {                      strcat(labelstr, ", ");                      strcat(labelstr, names[j]);                      strcat(labelstr, ", ");                      sprintf(labelstr + strlen(labelstr), "%0.2f", score);                  }                  printf("%s: %.0f%% ", names[j], score * 100);              }          }          if (class_id >= 0) {              int width = show_img->height * .006;              int offset = class_id * 123457 % classes;              float red = get_color(2, offset, classes);              float green = get_color(1, offset, classes);              float blue = get_color(0, offset, classes);              float rgb[3];              rgb[0] = red;              rgb[1] = green;              rgb[2] = blue;              box b = dets[i].bbox;              b.w = (b.w < 1) ? b.w : 1;              b.h = (b.h < 1) ? b.h : 1;              b.x = (b.x < 1) ? b.x : 1;              b.y = (b.y < 1) ? b.y : 1;              int left = (b.x - b.w / 2.)*show_img->width;              int right = (b.x + b.w / 2.)*show_img->width;              int top = (b.y - b.h / 2.)*show_img->height;              int bot = (b.y + b.h / 2.)*show_img->height;              if (left < 0) left = 0;              if (right > show_img->width - 1) right = show_img->width - 1;              if (top < 0) top = 0;              if (bot > show_img->height - 1) bot = show_img->height - 1;              float const font_size = show_img->height / 1000.F;              CvPoint pt1, pt2, pt_text, pt_text_bg1, pt_text_bg2;              pt1.x = left;              pt1.y = top;              pt2.x = right;              pt2.y = bot;              pt_text.x = left;              pt_text.y = top - 12;              pt_text_bg1.x = left;              pt_text_bg1.y = top - (10 + 25 * font_size);              pt_text_bg2.x = right;              pt_text_bg2.y = top;              CvScalar color;              color.val[0] = red * 256;              color.val[1] = green * 256;              color.val[2] = blue * 256;              cvRectangle(show_img, pt1, pt2, color, width, 8, 0);              if (ext_output)                  printf("t(left_x: %4.0f   top_y: %4.0f   width: %4.0f   height: %4.0f)n",                      (float)left, (float)top, b.w*show_img->width, b.h*show_img->height);              else                  printf("n");              cvRectangle(show_img, pt_text_bg1, pt_text_bg2, color, width, 8, 0);              cvRectangle(show_img, pt_text_bg1, pt_text_bg2, color, CV_FILLED, 8, 0);    // filled              CvScalar black_color;              black_color.val[0] = 0;              CvFont font;              cvInitFont(&font, CV_FONT_HERSHEY_SIMPLEX, font_size, font_size, 0, font_size * 3, 8);              cvPutText(show_img, labelstr, pt_text, &font, black_color);          }      }      if (ext_output) {          fflush(stdout);      }  }

以上操作都準備好了之後，執行python darknet_video_custom.py即可開始檢測圖片或影片中的物體。效果如下：

是不是很酷呢O(∩_∩)O~。本系列文章到此已經寫了4篇，分別是《快速感受物體檢測的酷炫》、《數據標註》、《模型訓練》、《模型使用》，我們已經體驗了整個物體檢測的過程，對物體檢測的過程有了一定的了解。下一篇《手把手教你用深度學習做物體檢測(五)：YOLO》會介紹一下YOLO演算法的相關內容，讓我們了解目標檢測背後是如何工作的。

ok，本篇就這麼多內容啦~，感謝閱讀O(∩_∩)O，88~

名句分享

人不是向外奔走才是旅行，靜靜坐著思維也是旅行，凡是探索、追尋、觸及那些不可知的情境，不論是風土的，或是心靈的，都是一種旅行。

—— 林清玄

為您推薦

如何在阿里雲租一台GPU伺服器做深度學習？

手把手教你用深度學習做物體檢測(三)：模型訓練

手把手教你用深度學習做物體檢測(二)：數據標註

手把手教你用深度學習做物體檢測(一)：快速感受物體檢測的酷炫

ubuntu16.04安裝Anaconda3

Unbuntu下持續觀察NvidiaGPU的狀態