零基礎入門深度學習（九）：目標檢測之常用數據預處理與增廣方法

2020 年 2 月 19 日
筆記

導讀

本課程是百度官方開設的零基礎入門深度學習課程，主要面向沒有深度學習技術基礎或者基礎薄弱的同學，幫助大家在深度學習領域實現從0到1+的跨越。從本課程中，你將學習到：

深度學習基礎知識
numpy實現神經網路構建和梯度下降演算法
電腦視覺領域主要方向的原理、實踐
自然語言處理領域主要方向的原理、實踐
個性化推薦演算法的原理、實踐

百度深度學習技術平台部資深研發工程師孫高峰，上一講為大家介紹了目標檢測的基本概念，本講將以林業病蟲害數據集為例，繼續為大家介紹目標檢測中的常用數據預處理與增廣方法

林業病蟲害數據集和數據預處理方法介紹

在本次的課程中，將使用百度與林業大學合作開發的林業病蟲害防治項目中用到昆蟲數據集，關於該項目和數據集的更多資訊，可以參考相關報道。在這一小節中將為讀者介紹該數據集，以及電腦視覺任務中常用的數據預處理方法。

讀取AI識蟲數據集標註資訊

AI識蟲數據集結構如下：

提供了2183張圖片，其中訓練集1693張，驗證集245，測試集245張。
包含7種昆蟲，分別是Boerner、Leconte、Linnaeus、acuminatus、armandi、coleoptera和linnaeus。
包含了圖片和標註，請讀者先將數據解壓，並存放在insects目錄下。

# 解壓數據腳本，第一次運行時打開注釋，將文件解壓到work目錄下  # !unzip -d /home/aistudio/work /home/aistudio/data/data19638/insects.zip

將數據解壓之後，可以看到insects目錄下的結構如下所示。

insects包含train、val和test三個文件夾。train/annotations/xmls目錄下存放著圖片的標註。每個xml文件是對一張圖片的說明，包括圖片尺寸、包含的昆蟲名稱、在圖片上出現的位置等資訊。

<annotation>          <folder>劉霏霏</folder>          <filename>100.jpeg</filename>          <path>/home/fion/桌面/劉霏霏/100.jpeg</path>          <source>                  <database>Unknown</database>          </source>          <size>                  <width>1336</width>                  <height>1336</height>                  <depth>3</depth>          </size>          <segmented>0</segmented>          <object>                  <name>Boerner</name>                  <pose>Unspecified</pose>                  <truncated>0</truncated>                  <difficult>0</difficult>                  <bndbox>                          <xmin>500</xmin>                          <ymin>893</ymin>                          <xmax>656</xmax>                          <ymax>966</ymax>                  </bndbox>          </object>          <object>                  <name>Leconte</name>                  <pose>Unspecified</pose>                  <truncated>0</truncated>                  <difficult>0</difficult>                  <bndbox>                          <xmin>622</xmin>                          <ymin>490</ymin>                          <xmax>756</xmax>                          <ymax>610</ymax>                  </bndbox>          </object>          <object>                  <name>armandi</name>                  <pose>Unspecified</pose>                  <truncated>0</truncated>                  <difficult>0</difficult>                  <bndbox>                          <xmin>432</xmin>                          <ymin>663</ymin>                          <xmax>517</xmax>                          <ymax>729</ymax>                  </bndbox>          </object>          <object>                  <name>coleoptera</name>                  <pose>Unspecified</pose>                  <truncated>0</truncated>                  <difficult>0</difficult>                  <bndbox>                          <xmin>624</xmin>                          <ymin>685</ymin>                          <xmax>697</xmax>                          <ymax>771</ymax>                  </bndbox>          </object>          <object>                  <name>linnaeus</name>                  <pose>Unspecified</pose>                  <truncated>0</truncated>                  <difficult>0</difficult>                  <bndbox>                          <xmin>783</xmin>                          <ymin>700</ymin>                          <xmax>856</xmax>                          <ymax>802</ymax>                  </bndbox>          </object>  </annotation>

上面列出的xml文件中的主要參數說明如下：

-size：圖片尺寸

-object：圖片中包含的物體，一張圖片可能中包含多個物體

name：昆蟲名稱
bndbox：物體真實框
difficult：識別是否困難

下面我們將從數據集中讀取xml文件，將每張圖片的標註資訊讀取出來。在讀取具體的標註文件之前，我們先完成一件事情，就是將昆蟲的類別名字（字元串）轉化成數字表示的類別。因為神經網路裡面計算時需要的輸入類型是數值型的，所以需要將字元串表示的類別轉化成具體的數字。昆蟲類別名稱的列表是：['Boerner', 'Leconte', 'Linnaeus', 'acuminatus', 'armandi', 'coleoptera', 'linnaeus']，這裡我們約定此列表中：'Boerner'對應類別0，'Leconte'對應類別1，…，'linnaeus'對應類別6。使用下面的程式可以得到表示名稱字元串和數字類別之間映射關係的字典。

INSECT_NAMES = ['Boerner', 'Leconte', 'Linnaeus',                  'acuminatus', 'armandi', 'coleoptera', 'linnaeus']    def get_insect_names():      """      return a dict, as following,          {'Boerner': 0,           'Leconte': 1,           'Linnaeus': 2,           'acuminatus': 3,           'armandi': 4,           'coleoptera': 5,           'linnaeus': 6          }      It can map the insect name into an integer label.      """      insect_category2id = {}      for i, item in enumerate(INSECT_NAMES):          insect_category2id[item] = i        return insect_category2id

cname2cid = get_insect_names()  cname2cid

{'Boerner': 0,   'Leconte': 1,   'Linnaeus': 2,   'acuminatus': 3,   'armandi': 4,   'coleoptera': 5,   'linnaeus': 6}

調用get_insect_names函數返回一個dict，其鍵-值對描述了昆蟲名稱-數字類別之間的映射關係。

下面的程式從annotations/xml目錄下面讀取所有文件標註資訊。

import osimport numpy as npimport xml.etree.ElementTree as ETdef get_annotations(cname2cid, datadir):    filenames = os.listdir(os.path.join(datadir, 'annotations', 'xmls'))    records = []    ct = 0    for fname in filenames:        fid = fname.split('.')[0]        fpath = os.path.join(datadir, 'annotations', 'xmls', fname)        img_file = os.path.join(datadir, 'images', fid + '.jpeg')        tree = ET.parse(fpath)        if tree.find('id') is None:            im_id = np.array([ct])        else:            im_id = np.array([int(tree.find('id').text)])        objs = tree.findall('object')        im_w = float(tree.find('size').find('width').text)        im_h = float(tree.find('size').find('height').text)        gt_bbox = np.zeros((len(objs), 4), dtype=np.float32)        gt_class = np.zeros((len(objs), ), dtype=np.int32)        is_crowd = np.zeros((len(objs), ), dtype=np.int32)        difficult = np.zeros((len(objs), ), dtype=np.int32)        for i, obj in enumerate(objs):            cname = obj.find('name').text            gt_class[i] = cname2cid[cname]            _difficult = int(obj.find('difficult').text)            x1 = float(obj.find('bndbox').find('xmin').text)            y1 = float(obj.find('bndbox').find('ymin').text)            x2 = float(obj.find('bndbox').find('xmax').text)            y2 = float(obj.find('bndbox').find('ymax').text)            x1 = max(0, x1)            y1 = max(0, y1)            x2 = min(im_w - 1, x2)            y2 = min(im_h - 1, y2)            # 這裡使用xywh格式來表示目標物體真實框            gt_bbox[i] = [(x1+x2)/2.0 , (y1+y2)/2.0, x2-x1+1., y2-y1+1.]            is_crowd[i] = 0            difficult[i] = _difficult        voc_rec = {            'im_file': img_file,            'im_id': im_id,            'h': im_h,            'w': im_w,            'is_crowd': is_crowd,            'gt_class': gt_class,            'gt_bbox': gt_bbox,            'gt_poly': [],            'difficult': difficult            }        if len(objs) != 0:            records.append(voc_rec)        ct += 1    return records

TRAINDIR = '/home/aistudio/work/insects/train'  TESTDIR = '/home/aistudio/work/insects/test'  VALIDDIR = '/home/aistudio/work/insects/val'  cname2cid = get_insect_names()  records = get_annotations(cname2cid, TRAINDIR)

len(records)

records[0]

{'difficult': array([0, 0, 0, 0, 0], dtype=int32),   'gt_bbox': array([[600. , 344.5, 135. , 172. ],          [540.5, 705. ,  56. , 129. ],          [661. , 831. ,  81. ,  71. ],          [782.5, 545.5,  48. ,  82. ],          [823. , 678. ,  59. ,  75. ]], dtype=float32),   'gt_class': array([1, 0, 4, 2, 5], dtype=int32),   'gt_poly': [],   'h': 1224.0,   'im_file': '/home/aistudio/work/insects/train/images/693.jpeg',   'im_id': array([0]),   'is_crowd': array([0, 0, 0, 0, 0], dtype=int32),   'w': 1224.0}

通過上面的程式，將所有訓練數據集的標註數據全部讀取出來了，存放在records列表下面，其中每一個元素是一張圖片的標註數據，包含了圖片存放地址，圖片id，圖片高度和寬度，圖片中所包含的目標物體的種類和位置。

數據讀取和預處理

數據預處理是訓練神經網路時非常重要的步驟。合適的預處理方法，可以幫助模型更好的收斂並防止過擬合。首先我們需要從磁碟讀入數據，然後需要對這些數據進行預處理，為了保證網路運行的速度通常還要對數據預處理進行加速。

數據讀取

前面已經將圖片的所有描述資訊保存在records中了，其中的每一個元素包含了一張圖片的描述，下面的程式展示了如何根據records裡面的描述讀取圖片及標註。

### 數據讀取  import cv2    def get_bbox(gt_bbox, gt_class):      # 對於一般的檢測任務來說，一張圖片上往往會有多個目標物體      # 設置參數MAX_NUM = 50， 即一張圖片最多取50個真實框；如果真實      # 框的數目少於50個，則將不足部分的gt_bbox, gt_class和gt_score的各項數值全設置為0      MAX_NUM = 50      gt_bbox2 = np.zeros((MAX_NUM, 4))      gt_class2 = np.zeros((MAX_NUM,))      for i in range(len(gt_bbox)):          gt_bbox2[i, :] = gt_bbox[i, :]          gt_class2[i] = gt_class[i]          if i >= MAX_NUM:              break      return gt_bbox2, gt_class2    def get_img_data_from_file(record):      """      record is a dict as following,        record = {              'im_file': img_file,              'im_id': im_id,              'h': im_h,              'w': im_w,              'is_crowd': is_crowd,              'gt_class': gt_class,              'gt_bbox': gt_bbox,              'gt_poly': [],              'difficult': difficult              }      """      im_file = record['im_file']      h = record['h']      w = record['w']      is_crowd = record['is_crowd']      gt_class = record['gt_class']      gt_bbox = record['gt_bbox']      difficult = record['difficult']        img = cv2.imread(im_file)      img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)        # check if h and w in record equals that read from img      assert img.shape[0] == int(h),                "image height of {} inconsistent in record({}) and img file({})".format(                 im_file, h, img.shape[0])        assert img.shape[1] == int(w),                "image width of {} inconsistent in record({}) and img file({})".format(                 im_file, w, img.shape[1])        gt_boxes, gt_labels = get_bbox(gt_bbox, gt_class)        # gt_bbox 用相對值      gt_boxes[:, 0] = gt_boxes[:, 0] / float(w)      gt_boxes[:, 1] = gt_boxes[:, 1] / float(h)      gt_boxes[:, 2] = gt_boxes[:, 2] / float(w)      gt_boxes[:, 3] = gt_boxes[:, 3] / float(h)        return img, gt_boxes, gt_labels, (h, w)

record = records[0]  img, gt_boxes, gt_labels, scales = get_img_data_from_file(record)

img.shape

(1224, 1224, 3)

gt_boxes.shape

(50, 4)

gt_labels

array([1., 0., 4., 2., 5., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,         0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

scales

(1224.0, 1224.0)

get_img_data_from_file()函數可以返回圖片數據的數據，它們是影像數據img, 真實框坐標gt_boxes, 真實框包含的物體類別gt_labels, 影像尺寸scales。

數據預處理

在電腦視覺中，通常會對影像做一些隨機的變化，產生相似但又不完全相同的樣本。主要作用是擴大訓練數據集，抑制過擬合，提升模型的泛化能力，常用的方法見下面的程式。

隨機改變亮暗、對比度和顏色等

import numpy as np  import cv2  from PIL import Image, ImageEnhance  import random    # 隨機改變亮暗、對比度和顏色等  def random_distort(img):      # 隨機改變亮度      def random_brightness(img, lower=0.5, upper=1.5):          e = np.random.uniform(lower, upper)          return ImageEnhance.Brightness(img).enhance(e)      # 隨機改變對比度      def random_contrast(img, lower=0.5, upper=1.5):          e = np.random.uniform(lower, upper)          return ImageEnhance.Contrast(img).enhance(e)      # 隨機改變顏色      def random_color(img, lower=0.5, upper=1.5):          e = np.random.uniform(lower, upper)          return ImageEnhance.Color(img).enhance(e)        ops = [random_brightness, random_contrast, random_color]      np.random.shuffle(ops)        img = Image.fromarray(img)      img = ops[0](img)      img = ops[1](img)      img = ops[2](img)      img = np.asarray(img)        return img

隨機填充

# 隨機填充  def random_expand(img,                    gtboxes,                    max_ratio=4.,                    fill=None,                    keep_ratio=True,                    thresh=0.5):      if random.random() > thresh:          return img, gtboxes        if max_ratio < 1.0:          return img, gtboxes        h, w, c = img.shape      ratio_x = random.uniform(1, max_ratio)      if keep_ratio:          ratio_y = ratio_x      else:          ratio_y = random.uniform(1, max_ratio)      oh = int(h * ratio_y)      ow = int(w * ratio_x)      off_x = random.randint(0, ow - w)      off_y = random.randint(0, oh - h)        out_img = np.zeros((oh, ow, c))      if fill and len(fill) == c:          for i in range(c):              out_img[:, :, i] = fill[i] * 255.0        out_img[off_y:off_y + h, off_x:off_x + w, :] = img      gtboxes[:, 0] = ((gtboxes[:, 0] * w) + off_x) / float(ow)      gtboxes[:, 1] = ((gtboxes[:, 1] * h) + off_y) / float(oh)      gtboxes[:, 2] = gtboxes[:, 2] / ratio_x      gtboxes[:, 3] = gtboxes[:, 3] / ratio_y        return out_img.astype('uint8'), gtboxes

隨機裁剪

隨機裁剪之前需要先定義兩個函數，multi_box_iou_xywh和box_crop這兩個函數將被保存在box_utils.py文件中。

import numpy as np    def multi_box_iou_xywh(box1, box2):      """      In this case, box1 or box2 can contain multi boxes.      Only two cases can be processed in this method:         1, box1 and box2 have the same shape, box1.shape == box2.shape         2, either box1 or box2 contains only one box, len(box1) == 1 or len(box2) == 1      If the shape of box1 and box2 does not match, and both of them contain multi boxes, it will be wrong.      """      assert box1.shape[-1] == 4, "Box1 shape[-1] should be 4."      assert box2.shape[-1] == 4, "Box2 shape[-1] should be 4."          b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2      b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2      b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2      b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2        inter_x1 = np.maximum(b1_x1, b2_x1)      inter_x2 = np.minimum(b1_x2, b2_x2)      inter_y1 = np.maximum(b1_y1, b2_y1)      inter_y2 = np.minimum(b1_y2, b2_y2)      inter_w = inter_x2 - inter_x1      inter_h = inter_y2 - inter_y1      inter_w = np.clip(inter_w, a_min=0., a_max=None)      inter_h = np.clip(inter_h, a_min=0., a_max=None)        inter_area = inter_w * inter_h      b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1)      b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1)        return inter_area / (b1_area + b2_area - inter_area)    def box_crop(boxes, labels, crop, img_shape):      x, y, w, h = map(float, crop)      im_w, im_h = map(float, img_shape)        boxes = boxes.copy()      boxes[:, 0], boxes[:, 2] = (boxes[:, 0] - boxes[:, 2] / 2) * im_w, (          boxes[:, 0] + boxes[:, 2] / 2) * im_w      boxes[:, 1], boxes[:, 3] = (boxes[:, 1] - boxes[:, 3] / 2) * im_h, (          boxes[:, 1] + boxes[:, 3] / 2) * im_h        crop_box = np.array([x, y, x + w, y + h])      centers = (boxes[:, :2] + boxes[:, 2:]) / 2.0      mask = np.logical_and(crop_box[:2] <= centers, centers <= crop_box[2:]).all(          axis=1)        boxes[:, :2] = np.maximum(boxes[:, :2], crop_box[:2])      boxes[:, 2:] = np.minimum(boxes[:, 2:], crop_box[2:])      boxes[:, :2] -= crop_box[:2]      boxes[:, 2:] -= crop_box[:2]        mask = np.logical_and(mask, (boxes[:, :2] < boxes[:, 2:]).all(axis=1))      boxes = boxes * np.expand_dims(mask.astype('float32'), axis=1)      labels = labels * mask.astype('float32')      boxes[:, 0], boxes[:, 2] = (boxes[:, 0] + boxes[:, 2]) / 2 / w, (          boxes[:, 2] - boxes[:, 0]) / w      boxes[:, 1], boxes[:, 3] = (boxes[:, 1] + boxes[:, 3]) / 2 / h, (          boxes[:, 3] - boxes[:, 1]) / h        return boxes, labels, mask.sum()

# 隨機裁剪  def random_crop(img,                  boxes,                  labels,                  scales=[0.3, 1.0],                  max_ratio=2.0,                  constraints=None,                  max_trial=50):      if len(boxes) == 0:          return img, boxes        if not constraints:          constraints = [(0.1, 1.0), (0.3, 1.0), (0.5, 1.0), (0.7, 1.0),                         (0.9, 1.0), (0.0, 1.0)]        img = Image.fromarray(img)      w, h = img.size      crops = [(0, 0, w, h)]      for min_iou, max_iou in constraints:          for _ in range(max_trial):              scale = random.uniform(scales[0], scales[1])              aspect_ratio = random.uniform(max(1 / max_ratio, scale * scale),                                             min(max_ratio, 1 / scale / scale))              crop_h = int(h * scale / np.sqrt(aspect_ratio))              crop_w = int(w * scale * np.sqrt(aspect_ratio))              crop_x = random.randrange(w - crop_w)              crop_y = random.randrange(h - crop_h)              crop_box = np.array([[(crop_x + crop_w / 2.0) / w,                                    (crop_y + crop_h / 2.0) / h,                                    crop_w / float(w), crop_h / float(h)]])                iou = multi_box_iou_xywh(crop_box, boxes)              if min_iou <= iou.min() and max_iou >= iou.max():                  crops.append((crop_x, crop_y, crop_w, crop_h))                  break        while crops:          crop = crops.pop(np.random.randint(0, len(crops)))          crop_boxes, crop_labels, box_num = box_crop(boxes, labels, crop, (w, h))          if box_num < 1:              continue          img = img.crop((crop[0], crop[1], crop[0] + crop[2],                          crop[1] + crop[3])).resize(img.size, Image.LANCZOS)          img = np.asarray(img)          return img, crop_boxes, crop_labels      img = np.asarray(img)      return img, boxes, labels

隨機縮放

# 隨機縮放def random_interp(img, size, interp=None):    interp_method = [        cv2.INTER_NEAREST,        cv2.INTER_LINEAR,        cv2.INTER_AREA,        cv2.INTER_CUBIC,        cv2.INTER_LANCZOS4,    ]    if not interp or interp not in interp_method:        interp = interp_method[random.randint(0, len(interp_method) - 1)]    h, w, _ = img.shape    im_scale_x = size / float(w)    im_scale_y = size / float(h)    img = cv2.resize(        img, None, None, fx=im_scale_x, fy=im_scale_y, interpolation=interp)    return img

隨機翻轉

# 隨機翻轉  def random_flip(img, gtboxes, thresh=0.5):      if random.random() > thresh:          img = img[:, ::-1, :]          gtboxes[:, 0] = 1.0 - gtboxes[:, 0]      return img, gtboxes

隨機打亂真實框排列順序

# 隨機打亂真實框排列順序  def shuffle_gtbox(gtbox, gtlabel):      gt = np.concatenate(          [gtbox, gtlabel[:, np.newaxis]], axis=1)      idx = np.arange(gt.shape[0])      np.random.shuffle(idx)      gt = gt[idx, :]      return gt[:, :4], gt[:, 4]

影像增廣方法

# 影像增廣方法匯總  def image_augment(img, gtboxes, gtlabels, size, means=None):      # 隨機改變亮暗、對比度和顏色等      img = random_distort(img)      # 隨機填充      img, gtboxes = random_expand(img, gtboxes, fill=means)      # 隨機裁剪      img, gtboxes, gtlabels, = random_crop(img, gtboxes, gtlabels)      # 隨機縮放      img = random_interp(img, size)      # 隨機翻轉      img, gtboxes = random_flip(img, gtboxes)      # 隨機打亂真實框排列順序      gtboxes, gtlabels = shuffle_gtbox(gtboxes, gtlabels)        return img.astype('float32'), gtboxes.astype('float32'), gtlabels.astype('int32')

img, gt_boxes, gt_labels, scales = get_img_data_from_file(record)  size = 512  img, gt_boxes, gt_labels = image_augment(img, gt_boxes, gt_labels, size)

img.shape

(512, 512, 3)

gt_boxes.shape

(50, 4)

gt_labels.shape

(50,)

這裡得到的img數據數值需要調整，需要除以255.，並且減去均值和方差，再將維度從[H, W, C]調整為[C, H, W]

img, gt_boxes, gt_labels, scales = get_img_data_from_file(record)  size = 512  img, gt_boxes, gt_labels = image_augment(img, gt_boxes, gt_labels, size)  mean = [0.485, 0.456, 0.406]  std = [0.229, 0.224, 0.225]  mean = np.array(mean).reshape((1, 1, -1))  std = np.array(std).reshape((1, 1, -1))  img = (img / 255.0 - mean) / std  img = img.astype('float32').transpose((2, 0, 1))  img

將上面的過程整理成一個函數get_img_data

def get_img_data(record, size=640):      img, gt_boxes, gt_labels, scales = get_img_data_from_file(record)      img, gt_boxes, gt_labels = image_augment(img, gt_boxes, gt_labels, size)      mean = [0.485, 0.456, 0.406]      std = [0.229, 0.224, 0.225]      mean = np.array(mean).reshape((1, 1, -1))      std = np.array(std).reshape((1, 1, -1))      img = (img / 255.0 - mean) / std      img = img.astype('float32').transpose((2, 0, 1))      return img, gt_boxes, gt_labels, scales

TRAINDIR = '/home/aistudio/work/insects/train'  TESTDIR = '/home/aistudio/work/insects/test'  VALIDDIR = '/home/aistudio/work/insects/val'  cname2cid = get_insect_names()  records = get_annotations(cname2cid, TRAINDIR)    record = records[0]  img, gt_boxes, gt_labels, scales = get_img_data(record, size=480)

img.shape

(3, 480, 480)

gt_boxes.shape

(50, 4)

gt_labels

array([0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0,         5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,         0, 0, 0, 0, 0, 0], dtype=int32)

scales

(1224.0, 1224.0)

批量數據讀取與加速

上面的程式展示了如何讀取一張圖片的數據並加速，下面的程式碼實現了批量數據讀取。

# 獲取一個批次內樣本隨機縮放的尺寸  def get_img_size(mode):      if (mode == 'train') or (mode == 'valid'):          inds = np.array([0,1,2,3,4,5,6,7,8,9])          ii = np.random.choice(inds)          img_size = 320 + ii * 32      else:          img_size = 608      return img_size    # 將 list形式的batch數據 轉化成多個array構成的tuple  def make_array(batch_data):      img_array = np.array([item[0] for item in batch_data], dtype = 'float32')      gt_box_array = np.array([item[1] for item in batch_data], dtype = 'float32')      gt_labels_array = np.array([item[2] for item in batch_data], dtype = 'int32')      img_scale = np.array([item[3] for item in batch_data], dtype='int32')      return img_array, gt_box_array, gt_labels_array, img_scale    # 批量讀取數據，同一批次內影像的尺寸大小必須是一樣的，  # 不同批次之間的大小是隨機的，  # 由上面定義的get_img_size函數產生  def data_loader(datadir, batch_size= 10, mode='train'):      cname2cid = get_insect_names()      records = get_annotations(cname2cid, datadir)        def reader():          if mode == 'train':              np.random.shuffle(records)          batch_data = []          img_size = get_img_size(mode)          for record in records:              #print(record)              img, gt_bbox, gt_labels, im_shape = get_img_data(record,                                                               size=img_size)              batch_data.append((img, gt_bbox, gt_labels, im_shape))              if len(batch_data) == batch_size:                  yield make_array(batch_data)                  batch_data = []                  img_size = get_img_size(mode)          if len(batch_data) > 0:              yield make_array(batch_data)        return reader

d = data_loader('/home/aistudio/work/insects/train', batch_size=2, mode='train')

img, gt_boxes, gt_labels, im_shape = next(d())

img.shape, gt_boxes.shape, gt_labels.shape, im_shape.shape

((2, 3, 608, 608), (2, 50, 4), (2, 50), (2, 2))

由於在數據預處理耗時較長，可能會成為網路訓練速度的瓶頸，所以需要對預處理部分進行優化。通過使用Paddle提供的API paddle.reader.xmap_readers可以開啟多執行緒讀取數據，具體實現程式碼如下。

import functools  import paddle    # 使用paddle.reader.xmap_readers實現多執行緒讀取數據  def multithread_loader(datadir, batch_size= 10, mode='train'):      cname2cid = get_insect_names()      records = get_annotations(cname2cid, datadir)      def reader():          if mode == 'train':              np.random.shuffle(records)          img_size = get_img_size(mode)          batch_data = []          for record in records:              batch_data.append((record, img_size))              if len(batch_data) == batch_size:                  yield batch_data                  batch_data = []                  img_size = get_img_size(mode)          if len(batch_data) > 0:              yield batch_data        def get_data(samples):          batch_data = []          for sample in samples:              record = sample[0]              img_size = sample[1]              img, gt_bbox, gt_labels, im_shape = get_img_data(record, size=img_size)              batch_data.append((img, gt_bbox, gt_labels, im_shape))          return make_array(batch_data)        mapper = functools.partial(get_data, )        return paddle.reader.xmap_readers(mapper, reader, 8, 10)

d = multithread_loader('/home/aistudio/work/insects/train', batch_size=2, mode='train')

img, gt_boxes, gt_labels, im_shape = next(d())

img.shape, gt_boxes.shape, gt_labels.shape, im_shape.shape

((2, 3, 480, 480), (2, 50, 4), (2, 50), (2, 2))

至此，我們完成了如何查看數據集中的數據、提取數據標註資訊、從文件讀取影像和標註數據、數據增多、批量讀取和加速等過程，通過multithread_loader可以返回img, gt_boxes, gt_labels, im_shape等數據，接下來就可以將它們輸入神經網路應用在具體演算法上面了。

在開始具體的演算法講解之前，先補充一下測試數據的讀取程式碼，測試數據沒有標註資訊，也不需要做影像增廣，程式碼如下所示。

# 測試數據讀取    # 將 list形式的batch數據 轉化成多個array構成的tuple  def make_test_array(batch_data):      img_name_array = np.array([item[0] for item in batch_data])      img_data_array = np.array([item[1] for item in batch_data], dtype = 'float32')      img_scale_array = np.array([item[2] for item in batch_data], dtype='int32')      return img_name_array, img_data_array, img_scale_array    # 測試數據讀取  def test_data_loader(datadir, batch_size= 10, test_image_size=608, mode='test'):      """      載入測試用的圖片，測試數據沒有groundtruth標籤      """      image_names = os.listdir(datadir)      def reader():          batch_data = []          img_size = test_image_size          for image_name in image_names:              file_path = os.path.join(datadir, image_name)              img = cv2.imread(file_path)              img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)              H = img.shape[0]              W = img.shape[1]              img = cv2.resize(img, (img_size, img_size))                mean = [0.485, 0.456, 0.406]              std = [0.229, 0.224, 0.225]              mean = np.array(mean).reshape((1, 1, -1))              std = np.array(std).reshape((1, 1, -1))              out_img = (img / 255.0 - mean) / std              out_img = out_img.astype('float32').transpose((2, 0, 1))              img = out_img #np.transpose(out_img, (2,0,1))              im_shape = [H, W]                batch_data.append((image_name.split('.')[0], img, im_shape))              if len(batch_data) == batch_size:                  yield make_test_array(batch_data)                  batch_data = []          if len(batch_data) > 0:              yield make_test_array(batch_data)        return reader