【連載21】Selective Search-3.11
- 2020 年 3 月 12 日
- 筆記

公眾號後台回復「python「,立刻領取100本機器學習必備Python電子書
目標檢測
目標檢測的發展歷程大致如下:

Selective Search
對於目標識別任務,比如判斷一張圖片中有沒有車、是什麼車,一般需要解決兩個問題:目標檢測、目標識別。而目標檢測任務中通常需要先通過某種方法做圖像分割,事先得到候選框;直觀的做法是:給定窗口,對整張圖片滑動掃描,結束後改變窗口大小重複上面步驟,缺點很明顯:重複勞動耗費資源、精度和質量不高等等。 針對上面的問題,一種解決方案是借鑒啟發式搜索的方法,充分利用人類的先驗知識。J.R.R. Uijlings在《Selective Search for Object Recoginition》提出一種方法:基於數據驅動,與具體類別無關的多種策略融合的啟發式生成方法。圖片包含各種豐富信息,例如:大小、形狀、顏色、紋理、物體重疊關係等,如果只使用一種信息往往不能解決大部分問題,例如:

左邊的兩隻貓可以通過顏色區別而不是通過紋理,右面的變色龍卻只能通過紋理區別而不是顏色。
啟發式生成設計準則
所以概括來說:
- 能夠捕捉到各種尺度物體,大的、小的、邊界清楚的、邊界模糊的等等; 多尺度的例子:

- 策略多樣性,採用多樣的策略集合共同作用;
- 計算快速,由於生成候選框只是檢測第一步,所以計算上它決不能成為瓶頸。
Selective Search
基於以上準則設計Selective Search算法:
- 採用層次分組算法解決尺度問題。 引入圖像分割中的自下而上分組思想,由於整個過程是層次的,在將整個圖合併成一個大的區域的過程中會輸出不同尺度的多個子區域。整個過程如下: 1、利用《Efficient Graph-Based Image Segmentation》(基本思想:將圖像中每個像素表示為圖上的一個節點,用於連接不同節點的無向邊都有一個權重,這個權重表示兩個節點之間的不相似度,通過貪心算法利用最小生成樹做圖像分割)生成初始候選區域; 2、採用貪心算法合併區域,計算任意兩個領域的相似度,把達到閾值的合併,再計算新區域和其所有領域的相似度,循環迭代,直到整個圖變成了一個區域,算法如下:

- 多樣化策略 三個方面:使用多種顏色空間、使用多種相似度計算方法、搜索起始區域不固定。 1、顏色空間有很多種:RGB、HSV、Lab等等,不是論文重點; 2、相似度衡量算法,結合了4重策略: ◆ 顏色相似度 以RGB為例,使用L1-norm歸一化每個圖像通道的色彩直方圖(bins=25),每個區域被表示為25×3維向量:; 顏色相似度定義為: 區域合併後對新的區域計算其色彩直方圖: 新區域的大小為: ◆ 紋理相似度 使用快速生成的類SIFT特徵,對每個顏色通道在8個方向上應用方差為1的高斯濾波器,對每個顏色通道的每個方向提取bins=10的直方圖,所以整個紋理向量維度為:3×8×10=240,表示為:; 紋理相似度定義為: ◆ 大小相似度 該策略希望小的區域能儘早合併,讓合併操作比較平滑,防止出現某個大區域逐步吞併其他小區域的情況。相似度定義為: 其中size(im)為圖像包含像素點數目。 ◆ 區域規則度相似度 能夠框住合併後的兩個區域的矩形大小越小說明兩個區域的合併越規則,如: 區域規則度相似度定義為:
最終相似度為所有策略加權和,文中採用等權方式:
使用Selective Search做目標識別
訓練過程包含:提取候選框、提取特徵、生成正負樣本、訓練模型,圖示如下:

早期圖像特徵提取往往是各種HOG特徵或BoW特徵,現在CNN特徵幾乎一統天下。 檢測定位效果評價採用Average Best Overlap(ABO)和Mean Average Best Overlap(MABO):
其中:為類別標註、為類別下的ground truth,為通過Selective Search生成的候選框。
)
代碼實踐
參見AlpacaDB。
- selectivesearch.py
# -*- coding: utf-8 -*- import skimage.io import skimage.feature import skimage.color import skimage.transform import skimage.util import skimage.segmentation import numpy # "Selective Search for Object Recognition" by J.R.R. Uijlings et al. # # - Modified version with LBP extractor for texture vectorization def _generate_segments(im_orig, scale, sigma, min_size): """ segment smallest regions by the algorithm of Felzenswalb and Huttenlocher """ # open the Image im_mask = skimage.segmentation.felzenszwalb( skimage.util.img_as_float(im_orig), scale=scale, sigma=sigma, min_size=min_size) # merge mask channel to the image as a 4th channel im_orig = numpy.append( im_orig, numpy.zeros(im_orig.shape[:2])[:, :, numpy.newaxis], axis=2) im_orig[:, :, 3] = im_mask return im_orig def _sim_colour(r1, r2): """ calculate the sum of histogram intersection of colour """ return sum([min(a, b) for a, b in zip(r1["hist_c"], r2["hist_c"])]) def _sim_texture(r1, r2): """ calculate the sum of histogram intersection of texture """ return sum([min(a, b) for a, b in zip(r1["hist_t"], r2["hist_t"])]) def _sim_size(r1, r2, imsize): """ calculate the size similarity over the image """ return 1.0 - (r1["size"] + r2["size"]) / imsize def _sim_fill(r1, r2, imsize): """ calculate the fill similarity over the image """ bbsize = ( (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"])) * (max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"])) ) return 1.0 - (bbsize - r1["size"] - r2["size"]) / imsize def _calc_sim(r1, r2, imsize): return (_sim_colour(r1, r2) + _sim_texture(r1, r2) + _sim_size(r1, r2, imsize) + _sim_fill(r1, r2, imsize)) def _calc_colour_hist(img): """ calculate colour histogram for each region the size of output histogram will be BINS * COLOUR_CHANNELS(3) number of bins is 25 as same as [uijlings_ijcv2013_draft.pdf] extract HSV """ BINS = 25 hist = numpy.array([]) for colour_channel in (0, 1, 2): # extracting one colour channel c = img[:, colour_channel] # calculate histogram for each colour and join to the result hist = numpy.concatenate( [hist] + [numpy.histogram(c, BINS, (0.0, 255.0))[0]]) # L1 normalize hist = hist / len(img) return hist def _calc_texture_gradient(img): """ calculate texture gradient for entire image The original SelectiveSearch algorithm proposed Gaussian derivative for 8 orientations, but we use LBP instead. output will be [height(*)][width(*)] """ ret = numpy.zeros((img.shape[0], img.shape[1], img.shape[2])) for colour_channel in (0, 1, 2): ret[:, :, colour_channel] = skimage.feature.local_binary_pattern( img[:, :, colour_channel], 8, 1.0) return ret def _calc_texture_hist(img): """ calculate texture histogram for each region calculate the histogram of gradient for each colours the size of output histogram will be BINS * ORIENTATIONS * COLOUR_CHANNELS(3) """ BINS = 10 hist = numpy.array([]) for colour_channel in (0, 1, 2): # mask by the colour channel fd = img[:, colour_channel] # calculate histogram for each orientation and concatenate them all # and join to the result hist = numpy.concatenate( [hist] + [numpy.histogram(fd, BINS, (0.0, 1.0))[0]]) # L1 Normalize hist = hist / len(img) return hist def _extract_regions(img): R = {} # get hsv image hsv = skimage.color.rgb2hsv(img[:, :, :3]) # pass 1: count pixel positions for y, i in enumerate(img): for x, (r, g, b, l) in enumerate(i): # initialize a new region if l not in R: R[l] = { "min_x": 0xffff, "min_y": 0xffff, "max_x": 0, "max_y": 0, "labels": [l]} # bounding box if R[l]["min_x"] > x: R[l]["min_x"] = x if R[l]["min_y"] > y: R[l]["min_y"] = y if R[l]["max_x"] < x: R[l]["max_x"] = x if R[l]["max_y"] < y: R[l]["max_y"] = y # pass 2: calculate texture gradient tex_grad = _calc_texture_gradient(img) # pass 3: calculate colour histogram of each region for k, v in R.items(): # colour histogram masked_pixels = hsv[:, :, :][img[:, :, 3] == k] R[k]["size"] = len(masked_pixels / 4) R[k]["hist_c"] = _calc_colour_hist(masked_pixels) # texture histogram R[k]["hist_t"] = _calc_texture_hist(tex_grad[:, :][img[:, :, 3] == k]) return R def _extract_neighbours(regions): def intersect(a, b): if (a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]) or ( a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or ( a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or ( a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]): return True return False R = regions.items() neighbours = [] for cur, a in enumerate(R[:-1]): for b in R[cur + 1:]: if intersect(a[1], b[1]): neighbours.append((a, b)) return neighbours def _merge_regions(r1, r2): new_size = r1["size"] + r2["size"] rt = { "min_x": min(r1["min_x"], r2["min_x"]), "min_y": min(r1["min_y"], r2["min_y"]), "max_x": max(r1["max_x"], r2["max_x"]), "max_y": max(r1["max_y"], r2["max_y"]), "size": new_size, "hist_c": ( r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size, "hist_t": ( r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size, "labels": r1["labels"] + r2["labels"] } return rt def selective_search( im_orig, scale=1.0, sigma=0.8, min_size=50): '''Selective Search Parameters ---------- im_orig : ndarray Input image scale : int Free parameter. Higher means larger clusters in felzenszwalb segmentation. sigma : float Width of Gaussian kernel for felzenszwalb segmentation. min_size : int Minimum component size for felzenszwalb segmentation. Returns ------- img : ndarray image with region label region label is stored in the 4th value of each pixel [r,g,b,(region)] regions : array of dict [ { 'rect': (left, top, right, bottom), 'labels': [...] }, ... ] ''' assert im_orig.shape[2] == 3, "3ch image is expected" # load image and get smallest regions # region label is stored in the 4th value of each pixel [r,g,b,(region)] img = _generate_segments(im_orig, scale, sigma, min_size) if img is None: return None, {} imsize = img.shape[0] * img.shape[1] R = _extract_regions(img) # extract neighbouring information neighbours = _extract_neighbours(R) # calculate initial similarities S = {} for (ai, ar), (bi, br) in neighbours: S[(ai, bi)] = _calc_sim(ar, br, imsize) # hierarchal search while S != {}: # get highest similarity i, j = sorted(S.items(), cmp=lambda a, b: cmp(a[1], b[1]))[-1][0] # merge corresponding regions t = max(R.keys()) + 1.0 R[t] = _merge_regions(R[i], R[j]) # mark similarities for regions to be removed key_to_delete = [] for k, v in S.items(): if (i in k) or (j in k): key_to_delete.append(k) # remove old similarities of related regions for k in key_to_delete: del S[k] # calculate similarity set with the new region for k in filter(lambda a: a != (i, j), key_to_delete): n = k[1] if k[0] in (i, j) else k[0] S[(t, n)] = _calc_sim(R[t], R[n], imsize) regions = [] for k, r in R.items(): regions.append({ 'rect': ( r['min_x'], r['min_y'], r['max_x'] - r['min_x'], r['max_y'] - r['min_y']), 'size': r['size'], 'labels': r['labels'] }) return img, regions
- example.py
# -*- coding: utf-8 -*- import matplotlib matplotlib.use("Agg") import matplotlib.pyplot as plt import skimage.data import skimage.io from skimage.io import use_plugin,imread import matplotlib.patches as mpatches from matplotlib.pyplot import savefig import selectivesearch def main(): # loading astronaut image #img = skimage.data.astronaut() use_plugin('pil') img = imread('car.jpg', as_grey=False) # perform selective search img_lbl, regions = selectivesearch.selective_search( img, scale=500, sigma=0.9, min_size=10) candidates = set() for r in regions: # excluding same rectangle (with different segments) if r['rect'] in candidates: continue # excluding regions smaller than 2000 pixels if r['size'] < 2000: continue # distorted rects x, y, w, h = r['rect'] if w / h > 1.2 or h / w > 1.2: continue candidates.add(r['rect']) # draw rectangles on the original image plt.figure() fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(6, 6)) ax.imshow(img) for x, y, w, h in candidates: print x, y, w, h rect = mpatches.Rectangle( (x, y), w, h, fill=False, edgecolor='red', linewidth=1) ax.add_patch(rect) #plt.show() savefig('MyFig.jpg') if __name__ == "__main__": main()
car.jpg原圖如下:

結果圖如下:
