用于说话人识别的度量学习的防御(CS SD)
- 2020 年 3 月 27 日
- 笔记
本文的目标是对不可见的说话者进行“开集”的说话者识别,其中理想的嵌入应该能够将信息压缩成具有小的类内(同一说话者)和大的类间(不同说话者)距离的紧凑的话语级表示。
人们普遍认为,通过分类目标训练的网络要优于度量学习方法。在本文中,我们对VoxCeleb数据集上的说话人识别的最新丢失函数进行了广泛的评估。我们的研究表明:与基于分类的损失相比,即使是最普通的三重损失,也显示出了良好的竞争性能;而那些使用度量学习目标训练的三重损失,也比最先进的方法表现得更好。
原文题目:In defence of metric learning for speaker recognition
原文:The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-class (same speaker) and large inter-class (different speakers) distance.
A popular belief in speaker recognition is that networks trained with classification objectives outperform metric learning methods. In this paper, we present an extensive evaluation of most recent loss functions for speaker recognition on the VoxCeleb dataset. We demonstrate that even the vanilla triplet loss shows competitive performance compared to classification-based losses, and those trained with our angular metric learning objective outperform state-of-the-art methods.
原文作者:Joon Son Chung, Jaesung Huh, Seongkyu Mun, Minjae Lee, Hee Soo Heo, Soyeon Choe, Chiheon Ham, Sunghwan Jung, Bong-Jin Lee, Icksang Han
原文地址:https://arxiv.org/abs/2003.11982