对流-任务网络的实证研究(CS SD)
- 2020 年 3 月 17 日
- 筆記
Convo – tasnet是最近提出的一种基于波形的深度神经网络,在语音源分离中取得了最先进的性能。它的架构由一个可学习的编码器/解码器和一个在这个学习空间上操作的分隔符组成。人们提出了各种各样的改进意见。然而,它们主要关注于分隔符,而将其编码器/解码器作为(浅层的)线性操作符。在这篇论文中,我们对对流任务网络进行了实证研究,并提出了一种基于(深度)非线性变化的编码器/解码器的改进方案。此外,我们对更大、更多样化的LibriTTS数据集进行了实验,并在更大的数据集上研究了所研究模型的泛化能力。我们建议跨数据集评估,包括评估从WSJ0-2mix、LibriTTS和VCTK数据库中分离出来的数据。我们的结果表明,增强的编码器/解码器可以提高平均SI-SNR性能超过1分贝。此外,我们提供了对Conv-TasNet的泛化能力和改进编码器/解码器的潜在价值的见解。
原文题目:An empirical study of Conv-TasNet
原文:Conv-TasNet is a recently proposed waveform-based deep neural network that achieves state-of-the-art performance in speech source separation. Its architecture consists of a learnable encoder/decoder and a separator that operates on top of this learned space. Various improvements have been proposed to Conv-TasNet. However, they mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. In this paper, we conduct an empirical study of Conv-TasNet and propose an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it. In addition, we experiment with the larger and more diverse LibriTTS dataset and investigate the generalization capabilities of the studied models when trained on a much larger dataset. We propose cross-dataset evaluation that includes assessing separations from the WSJ0-2mix, LibriTTS and VCTK databases. Our results show that enhancements to the encoder/decoder can improve average SI-SNR performance by more than 1 dB. Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.
原文作者:Berkan Kadioglu, Michael Horgan, Xiaoyu Liu, Jordi Pons, Dan Darcy, Vivek Kumar
原文地址:http://cn.arxiv.org/abs/2002.08688