論文閱讀:《Adaptive Subspaces for Few-Shot Learning》

2020 年 10 月 10 日
AI

十月了
2020年的最後一個假期也過了然而還是沒有收心好好科研害
來更新一下的最近的論文閱讀。

論文名稱：《Adaptive Subspaces for Few-Shot Learning》
論文地址：//openaccess.thecvf.com/content_CVPR_2020/papers/Simon_Adaptive_Subspaces_for_Few-Shot_Learning_CVPR_2020_paper.pdf
論文閱讀參考：//blog.csdn.net/qq_36104364/article/details/106984460
//blog.csdn.net/feifeiyaa/article/details/107461643
源碼地址：//github.com/chrysts/dsn_fewshot
本篇文章只記錄個人閱讀論文的筆記，具體翻譯、程式碼等不展開，詳細可見上述的鏈接.

Background

Various studies show that many deep learning techniques in computer vision, speech recognition and natural language understanding, to name but a few, will fail to produce reliable models that generalize well if limited annotations are available.Apart from the labor associated with annotating data, precise annotation can become ill-posed in some cases
各種研究表明，在電腦視覺，語音識別和自然語言理解方面，如果僅僅只有少量的樣本，那麼許多深度學習技術將無法產生可靠的模型，這些模型如果可以使用有限的注釋，則可很好地泛化。
In contrast to the current trend in deep learning, humans can learn new objects from only a few examples. This in turn provides humans with lifelong learning abilities. Inspired by such learning abilities, several approaches are developed to study learning from limited samples . This type of learning, known as Few-Shot Learning (FSL)
與當前的深度學習趨勢相反，人類只能從幾個例子中學習新的對象。反過來，這為人類提供了終身學習能力。受這種學習能力的啟發，人們開發了幾種方法來研究有限樣本的學習。這種類型的學習稱為「少樣本學習」（FSL）

Related Work

1.Some of the early works use generative models and similarity learning to capture the variation within
parts and geometric configurations of objects.
These works use hand-crafted features to perform few-shot classification
2.The deep learning has been very successful in learning discriminative features from images.
3.FSL based on metric-learning is the closest direction to our work
早期的工作主要是通過生成模型和相似度來學習特徵（會利用手動提取的特徵進行小樣本的分類）
近年來，深度學習在從影像中學習鑒別特徵方面非常成功
和本文所提出的方法比較接近的是小樣本學習中的基於度量的學習，包括孿生神經網路，匹配網路，原型網路等等。

Some other methods

如上圖所示，a,b,c為三種基於度量的小樣本學習模型
a.Pair-Wise Classifier:
It is possible to build a classifier directly from samples by calculating the similarity between them
成對的分類器，分別計算查詢樣本對應的特徵向量和每個支援樣本對應的特徵向量之間的距離，然後按照最近鄰的思想預測類別(直接從樣本中構造分類器），如Matching Network。
b.Prototype Classifier:
By introducing a simple multi-layer perceptron, the average of feature vectors from the final activation layer is used to perform few-shot classification
原型分類器，為支援集中每個類別計算出一個原型（每個類別所有樣本特徵向量的平均值），然後根據查詢樣本的特徵向量與各個類別原型之間的距離預測類別，如Prototypical Network。
c.Non-Linear Binary Classifier:
This approach exploits the non-linearity of the decision boundaries
非線性二元分類器，利用邊界的決策性，使用神經網路學習到一個非線性的距離度量函數，如Relation Network。

Our work

Contributions:
i. Few-shot learning solutions are formulated within a framework of generating dynamic classifiers.
ii. We propose an extension of existing dynamic classifiers by using subspaces. We rely on a well-established concept stating that a second-order method generalizes better for classification tasks.
iii. We also introduce a discriminative formulation where maximum discrimination between subspaces is encouraged during training. This solution boosts the performance even further.
iv. We show that our method can make use of unlabeled data and hence it lends itself to the problem of semi-supervised few-shot learning and transductive setting.The robustness of such a variant is assessed in ourexperiments.
1.本文的小樣本學習的解決方案是在動態分類器上制定的。
補充一下：本文把小樣本學習定義為兩個階段的學習，即1.學習通用特徵提取.2然後從有限的數據動態生成分類器。
2.我們提出了利用子空間擴展現有動態分類器。
3.我們還引入了一個判別性的公式（損失函數）
4.我們的方法還能推廣到半監督學習。

Our method

Subspaces for Few-Shot Classification:(如上圖所示）子空間分類器，為每個類別計算出一個特徵空間的子空間，然後將查詢樣本的特徵向量投射到子空間中，在子空間中進行距離度量，並預測類別。
我們的目標是學習特徵提取器Θ以生成子空間，即生成的空間適合於子空間分類器的函數。

步驟：

1.輸入圖片，計算每個類別c對應的特徵向量：
利用特徵提取器 $f_{\Theta}$ 將輸入影像映射到特徵空間中得到對應的特徵向量 $f_{\Theta}(x_i)$ ，然後通過平均的方式計算每個類 c對應的特徵向量均值 $\mu_c$ μ

網路結構：(backbone)特徵提取網路採用4層卷積神經網路或ResNet網路
2.進行奇異值分解:
對於每個類別 c 都可以得到集合 $\tilde{X}_c=[f_{\Theta}(x_{c,1})-\mu_c,...,f_{\Theta}(x_{c,K})-\mu_c]$
，對 $\tilde{X}_c$ 進行奇異值分解（SVD）得到 $\tilde{X}_c=U\sum V^T$ ，然後選取 U中前 n個維度得到截斷矩陣 $P_c$ ，由 $\tilde{X}_c$ 得到子空間 $P_c$ 的過程其實就是截斷奇異值分解（TSVD）,其與主成分分析PCA非常類似，是一種數據降維的方法。
3.計算查詢向量到每個類別之間的距離:
得到每個類別對應的子空間 $P_c$ 後，可以計算查詢向量 $f_{\Theta}(q)$ 與每個類別之間的距離 d(q) ，計算過程如下

4.通過softmax計算查詢樣本到哦每個類別之間的概率:

損失函數：

損失函數第一項為分類損失(採用分類概率的負對數)，第二項為正則化項，主要是通過最大化各個子空間之間的距離來實現的，各個子空間之間的距離主要是使用Grassmannian投影矩陣實現的。

整個演算法過程：

Semi-Supervised Learning(半監督演算法的推廣）
本文的方法還可以推廣到半監督訓練中，具體操作如下
修改 $\mu_c$ :

mi 是對於沒有標籤的the soft-assignment score。為了處理干擾的存在，本文使用均值為零的偽類

Experiment

Datasets:
1.mini-ImageNet
2.tiered-ImageNet
( 這個數據集也是從Image Net派生出來的，但與mini-ImageNet相比，它包含了一組更廣泛的類。)
3.CIFAR-100.
(我們對CIFAR-FS數據拆分進行了評估。這些數據集上的所有影像都是32×32，每個類的樣本數是600)
4.Open MIC.
(此數據集包含來自10個博物館展覽空間的影像。在這個數據集中，每個類有866個類和1-20個影像)

Backbones：
4-convolutional layers (Conv-4) and ResNet-12

Result:

以上幾個表格展示了本文所提的DSN（deep subspace networks）在四個數據集的實驗結果，使用不同的網路骨架的結果，很明顯本文所提出的方法與其他方法相比較更勝一籌，除此，還做了在半監督方法上的實驗以及是否引入判別式的消融研究。

總的來說，在本文所提出的方法中，對每個類別都計算了特定的子空間，並在子空間中進行距離度量，因此作者稱其為自適應子空間（Adaptive Subspaces）。每個類別的距離度量過程都與類別相關，因此作者稱其為動態分類器（Dynamic Classifier）。整體的方法也是基於度量學習上的，只是對這個「度量」有了自己的定義，除外本文還對所提方法的魯棒性（即雜訊影響）和計算複雜度進行了討論，這裡不詳細展開，具體可參考原論文。