六種常見的數據擴增方式（附程式碼）

2019 年 10 月 11 日
筆記

在某些場景下的目標檢測中，樣本數量較小，導致檢測的效果比較差，這時就需要進行數據擴增。本文介紹常用的6類數據擴增方式，包括裁剪、平移、改變亮度、加入雜訊、旋轉角度以及鏡像。

每一部分的參考資料也附在程式碼的介紹位置，大家可以參考。

裁剪（需要改變bbox）：裁剪後的圖片需要包含所有的框，否則會對影像的原始標註造成破壞。

 def _crop_img_bboxes(self,img,bboxes):          '''          裁剪後圖片要包含所有的框          輸入：              img：影像array              bboxes：該影像包含的所有boundingboxes，一個list，每個元素為[x_min,y_min,x_max,y_max]                      要確保是數值          輸出：              crop_img：裁剪後的影像array              crop_bboxes：裁剪後的boundingbox的坐標，list          '''          #------------------ 裁剪影像 ------------------          w = img.shape[1]          h = img.shape[0]            x_min = w          x_max = 0          y_min = h          y_max = 0          for bbox in bboxes:              x_min = min(x_min, bbox[0])              y_min = min(y_min, bbox[1])              x_max = max(x_max, bbox[2])              y_max = max(x_max, bbox[3])              name = bbox[4]            # 包含所有目標框的最小框到各個邊的距離          d_to_left = x_min          d_to_right = w - x_max          d_to_top = y_min          d_to_bottom = h - y_max            # 隨機擴展這個最小範圍          crop_x_min = int(x_min - random.uniform(0, d_to_left))          crop_y_min = int(y_min - random.uniform(0, d_to_top))          crop_x_max = int(x_max + random.uniform(0, d_to_right))          crop_y_max = int(y_max + random.uniform(0, d_to_bottom))            # 確保不出界          crop_x_min = max(0, crop_x_min)          crop_y_min = max(0, crop_y_min)          crop_x_max = min(w, crop_x_max)          crop_y_max = min(h, crop_y_max)            crop_img = img[crop_y_min:crop_y_max, crop_x_min:crop_x_max]            #------------------ 裁剪bounding boxes ------------------          crop_bboxes = list()          for box in bboxes:              crop_bboxes.append([bbox[0]-crop_x_min, bbox[1]-crop_y_min,                                 bbox[2]-crop_x_max, bbox[3]-crop_y_max,name])            return crop_img, crop_bboxes

平移（需要改變bbox）：平移後的圖片需要包含所有的框，否則會對影像的原始標註造成破壞。

def _shift_pic_bboxes(self, img, bboxes):          '''          平移後需要包含所有的框          參考資料：https://blog.csdn.net/sty945/article/details/79387054          輸入：              img：影像array              bboxes：該影像包含的所有boundingboxes，一個list，每個元素為[x_min,y_min,x_max,y_max]                      要確保是數值          輸出：              shift_img：平移後的影像array              shift_bboxes：平移後的boundingbox的坐標，list          '''          #------------------ 平移影像 ------------------          w = img.shape[1]          h = img.shape[0]            x_min = w          x_max = 0          y_min = h          y_max = 0          for bbox in bboxes:              x_min = min(x_min, bbox[0])              y_min = min(y_min, bbox[1])              x_max = max(x_max, bbox[2])              y_max = max(x_max, bbox[3])              name = bbox[4]            # 包含所有目標框的最小框到各個邊的距離，即每個方向的最大移動距離          d_to_left = x_min          d_to_right = w - x_max          d_to_top = y_min          d_to_bottom = h - y_max            #在矩陣第一行中表示的是[1,0,x],其中x表示影像將向左或向右移動的距離，如果x是正值，則表示向右移動，如果是負值的話，則表示向左移動。          #在矩陣第二行表示的是[0,1,y],其中y表示影像將向上或向下移動的距離，如果y是正值的話，則向下移動，如果是負值的話，則向上移動。          x = random.uniform(-(d_to_left/3), d_to_right/3)          y = random.uniform(-(d_to_top/3), d_to_bottom/3)          M = np.float32([[1, 0, x], [0, 1, y]])            # 仿射變換          shift_img = cv2.warpAffine(img, M, (img.shape[1], img.shape[0])) #第一個參數表示我們希望進行變換的圖片，第二個參數是我們的平移矩陣，第三個希望展示的結果圖片的大小            #------------------ 平移boundingbox ------------------          shift_bboxes = list()          for bbox in bboxes:              shift_bboxes.append([bbox[0]+x, bbox[1]+y, bbox[2]+x, bbox[3]+y, name])            return shift_img, shift_bboxes

改變亮度：改變亮度比較簡單，不需要處理bounding boxes

def _changeLight(self,img):          '''          adjust_gamma(image, gamma=1, gain=1)函數:          gamma>1時，輸出影像變暗，小於1時，輸出影像變亮          輸入：              img：影像array          輸出：              img：改變亮度後的影像array          '''          flag = random.uniform(0.5, 1.5) ##flag>1為調暗,小於1為調亮          return exposure.adjust_gamma(img, flag)

加入雜訊：加入雜訊也比較簡單，不需要處理bounding boxes

    def _addNoise(self,img):          '''          輸入：              img：影像array          輸出：              img：加入雜訊後的影像array,由於輸出的像素是在[0,1]之間,所以得乘以255          '''          return random_noise(img, mode='gaussian', clip=True) * 255

旋轉：旋轉後的圖片需要包含所有的框，否則會對影像的原始標註造成破壞。需要注意的是，旋轉時影像的一些邊角可能會被切除掉，需要避免這種情況。

 def _rotate_img_bboxes(self, img, bboxes, angle=5, scale=1.):          '''          參考：https://blog.csdn.net/saltriver/article/details/79680189                https://www.ctolib.com/topics-44419.html          關於仿射變換：https://www.zhihu.com/question/20666664          輸入:              img:影像array,(h,w,c)              bboxes:該影像包含的所有boundingboxs,一個list,每個元素為[x_min, y_min, x_max, y_max],要確保是數值              angle:旋轉角度              scale:默認1          輸出:              rot_img:旋轉後的影像array              rot_bboxes:旋轉後的boundingbox坐標list          '''          #---------------------- 旋轉影像 ----------------------          w = img.shape[1]          h = img.shape[0]          # 角度變弧度          rangle = np.deg2rad(angle)          # 計算新影像的寬度和高度，分別為最高點和最低點的垂直距離          nw = (abs(np.sin(rangle)*h) + abs(np.cos(rangle)*w))*scale          nh = (abs(np.cos(rangle)*h) + abs(np.sin(rangle)*w))*scale          # 獲取影像繞著某一點的旋轉矩陣          # getRotationMatrix2D(Point2f center, double angle, double scale)                              # Point2f center：表示旋轉的中心點                              # double angle：表示旋轉的角度                              # double scale：影像縮放因子                              #參考：https://cloud.tencent.com/developer/article/1425373          rot_mat = cv2.getRotationMatrix2D((nw*0.5, nh*0.5), angle, scale) # 返回 2x3 矩陣          # 新中心點與舊中心點之間的位置          rot_move = np.dot(rot_mat,np.array([(nw-w)*0.5, (nh-h)*0.5,0]))          # the move only affects the translation, so update the translation          # part of the transform          rot_mat[0,2] += rot_move[0]          rot_mat[1,2] += rot_move[1]          # 仿射變換          rot_img = cv2.warpAffine(img, rot_mat, (int(math.ceil(nw)), int(math.ceil(nh))), flags=cv2.INTER_LANCZOS4) # ceil向上取整            #---------------------- 矯正boundingbox ----------------------          # rot_mat是最終的旋轉矩陣          # 獲取原始bbox的四個中點，然後將這四個點轉換到旋轉後的坐標系下          rot_bboxes = list()          for bbox in bboxes:              x_min = bbox[0]              y_min = bbox[1]              x_max = bbox[2]              y_max = bbox[3]              name = bbox[4]              point1 = np.dot(rot_mat, np.array([(x_min+x_max)/2, y_min,1]))              point2 = np.dot(rot_mat, np.array([x_max, (y_min+y_max)/2, 1]))              point3 = np.dot(rot_mat, np.array([(x_min+x_max)/2, y_max, 1]))              point4 = np.dot(rot_mat, np.array([x_min, (y_min+y_max)/2, 1]))                # 合併np.array              concat = np.vstack((point1, point2,point3,point4)) # 在豎直方向上堆疊              # 改變array類型              concat = concat.astype(np.int32)              # 得到旋轉後的坐標              rx, ry, rw, rh = cv2.boundingRect(concat)              rx_min = rx              ry_min = ry              rx_max = rx+rw              ry_max = ry+rh              # 加入list中              rot_bboxes.append([rx_min, ry_min, rx_max, ry_max,name])            return rot_img, rot_bboxes

鏡像：旋轉後的圖片需要包含所有的框，否則會對影像的原始標註造成破壞。這裡只介紹兩種鏡像方式：水平翻轉和垂直翻轉

 # 鏡像      def _flip_pic_bboxes(self, img, bboxes):          '''          參考：https://blog.csdn.net/jningwei/article/details/78753607          鏡像後的圖片要包含所有的框          輸入：              img：影像array              bboxes：該影像包含的所有boundingboxs,一個list,每個元素為[x_min, y_min, x_max, y_max],要確保是數值          輸出:              flip_img:鏡像後的影像array              flip_bboxes:鏡像後的bounding box的坐標list          '''          # ---------------------- 鏡像影像 ----------------------          import copy          flip_img = copy.deepcopy(img)          if random.random() < 0.5:              horizon = True          else:              horizon = False          h, w, _ = img.shape          if horizon: # 水平翻轉              flip_img = cv2.flip(flip_img, -1)          else:              flip_img = cv2.flip(flip_img, 0)          # ---------------------- 矯正boundingbox ----------------------          flip_bboxes = list()          for bbox in bboxes:              x_min = bbox[0]              y_min = bbox[1]              x_max = bbox[2]              y_max = bbox[3]              name = bbox[4]              if horizon:                  flip_bboxes.append([w-x_max, y_min, w-x_min, y_max, name])              else:                  flip_bboxes.append([x_min, h-y_max, x_max, h-y_min, name])            return flip_img, flip_bboxes

六種常見的數據擴增方式（附程式碼）

VirMach 便宜 VPS

QNews

六種常見的數據擴增方式（附程式碼）

分享此文：

Related Posts

小白都看得懂的Javadoc使用教程

CSS——多列

手把手教你如何在window下將jenkins+allure集成生成的測試報告通過jenkins配置郵箱自動發送-04（非常詳細，非常實用）

域名的分級

VirMach 便宜 VPS

QNews

熱門搜尋