論文閱讀：《Wasserstein Distance Guided Representation Learning for Domain Adaptation》

2021 年 6 月 22 日
AI

真真真好久沒有更新論文閱讀了（論文也在庫存狀態……）
一下子就六月底了這個月真的好快
但希望這個月的努力會有收穫！
今天的論文閱讀是針對域適應的一篇

論文名稱：
《Wasserstein Distance Guided Representation Learning for Domain Adaptation》
論文地址：//arxiv.org/abs/1707.01217v4
論文程式碼：//github.com/RockySJ/WDGRL
論文參考閱讀：//blog.csdn.net/qq_41076797/article/details/116942752

Background

1.Domain adaptation aims at generalizing a high-performance learner on a target domain via utilizing the knowledge distilled from a source domain which has a different but related data distribution. One solution to domain adaptation is to learn domain invariant feature representations while the learned representations should also be discriminative inprediction.
域適應的目標是利用從具有不同但相關數據分布的源領域提取的知識，在目標領域上推廣。域適應的一個解決方案是學習領域不變的特徵表示，而學習的表示在預測中也應該具有鑒別性.
2.To effectively transfer a classifier across different domains, different methods have been proposed, including instance reweightingsubsampling feature mappingand weight regularization
在這些方法中，特徵映射最近取得了巨大的成功，它將來自不同域的數據投影到一個特徵表示是域不變的公共潛在空間中。
3.On the other hand, generative adversarial nets (GANs)are heavily studied during recent years, which play a minimax game between two adversarial networks: the discriminator is trained to distinguish real data from the generated data, while the generator learns to generate high-quality data to fool the discriminator
然而，當域分類器網路能夠完美地區分目標表示和源表示時，就會出現梯度消失問題。一個更合理的解決方案是用瓦瑟斯坦距離代替域差異測度，即使兩個分布距離遙遠，也能提供更穩定的梯度

Related Works

這篇文章對於域適應的方法分類還是比較完整的，故，這邊記錄一下。
i). Instance-based methods, which reweight/subsample the source samples to match the distribution of the target domain, thus training on the reweighted source samples guarantees classifiers with transferability
基於實例的方法，對源樣本進行重加權/子樣本，以匹配目標域的分布
ii). Parameter-based methods, which transfer knowledge through shared or regularized parameters ofsource and target domain learners, or by combining mul�tiple reweighted source learners to form an improved target learner
基於參數的方法，通過源域和目標域學習者的共享或正則化參數來傳遞知識
. iii). feature-based, which can be further categorized into two groups
Asymmetric feature-based methods transform the features of one domain to more closely match another domain
symmetric feature-based methods map different domains to a common latent space where the feature distributions are close.
基於非對稱特徵的方法將一個域的特徵轉換為更接近另一個域
基於對稱特徵的方法將不同的域映射到一個特徵分布接近的共同潛在空間。

Work

In this paper, we propose a domain invariant representation learning approach to reduce domain discrepancy for domain adaptation, namely Wasserstein Distance Guided Representation Learning (WDGRL), inspired by recently proposed Wasserstein GAN
本文巧妙地把WGAN的度量用在了domain adaptation上，提出WGDRL度量
（值得注意，這裡的WDGRL中的GRL不是DANN裡面的GRL，注意區分）
Our WDGRL differs from previous adversarial methods:
i). WDGRL adopts an iterative ad versarial training strategy
ii). WDGRL adopts Wasserstein distance as the adversarial loss which has gradient superiority
我們的WDGRL不同於以前的對抗性方法：i)。WDGRL採用了一種迭代的對抗性訓練策略，ii)。WDGRL採用Wasserstein distance作為具有梯度優勢的對抗性損失

Model

補充知識：Wasserstein Metric
The Wasserstein metric is a distance measure between probability distributions on a given metric space (M, ρ), where ρ(x, y) is a distance function for two instances x and y in the set M. The p-th Wasserstein distance between two Borel probability measures P and Q is defined as

可參考：//blog.csdn.net/zkq_1986/article/details/84937388
進入正題
WDGRL trains a domain critic network to estimate the empirical Wasserstein distance between the source and target feature representations. The feature extractor network
will then be optimized to minimize the estimated Wasserstein distance in an adversarial manner. By iterative adversarial training, we finally learn feature representations invariant to the covariate shift between domains.
WDGRL訓練一個領域判別網路來估計源和目標特徵表示之間的Wasserstein distance。特徵提取器網路將被優化，以對抗的方式最小化估計的Wasserstein distance。
通過迭代對抗訓練，我們最終學習了域之間協變數變化不變的特徵表示
WDGRL可以很容易地在現有的領域中被採用。完整模型如下所示：

源域和目標域數據同時經過相同的特徵提取網路進行特徵提取，然後先經過Domain Critic Network來調整（Domain Critic Network的）參數，使在其滿足Lipschitz條件下，最大化目標式，這樣Domain Critic Network生成的損失才具有可信度。最後，在通過分類器Discriminator進行分類，得到相應的標籤分類損失。
分類損失為交叉熵損失：

為避免梯度消失或爆炸性問題。對域批評者參數θw實施梯度懲罰
其中，懲罰梯度的特徵表示不僅在源和目標表示上定義，而且在源和目標表示對之間沿直線上的隨機點上定義。所以我們可以通過解決這個問題來估計Wasserstein distance

總目標函數：