對流-任務網路的實證研究(CS SD)
- 2020 年 3 月 17 日
- 筆記
Convo – tasnet是最近提出的一種基於波形的深度神經網路,在語音源分離中取得了最先進的性能。它的架構由一個可學習的編碼器/解碼器和一個在這個學習空間上操作的分隔符組成。人們提出了各種各樣的改進意見。然而,它們主要關注於分隔符,而將其編碼器/解碼器作為(淺層的)線性操作符。在這篇論文中,我們對對流任務網路進行了實證研究,並提出了一種基於(深度)非線性變化的編碼器/解碼器的改進方案。此外,我們對更大、更多樣化的LibriTTS數據集進行了實驗,並在更大的數據集上研究了所研究模型的泛化能力。我們建議跨數據集評估,包括評估從WSJ0-2mix、LibriTTS和VCTK資料庫中分離出來的數據。我們的結果表明,增強的編碼器/解碼器可以提高平均SI-SNR性能超過1分貝。此外,我們提供了對Conv-TasNet的泛化能力和改進編碼器/解碼器的潛在價值的見解。
原文題目:An empirical study of Conv-TasNet
原文:Conv-TasNet is a recently proposed waveform-based deep neural network that achieves state-of-the-art performance in speech source separation. Its architecture consists of a learnable encoder/decoder and a separator that operates on top of this learned space. Various improvements have been proposed to Conv-TasNet. However, they mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. In this paper, we conduct an empirical study of Conv-TasNet and propose an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it. In addition, we experiment with the larger and more diverse LibriTTS dataset and investigate the generalization capabilities of the studied models when trained on a much larger dataset. We propose cross-dataset evaluation that includes assessing separations from the WSJ0-2mix, LibriTTS and VCTK databases. Our results show that enhancements to the encoder/decoder can improve average SI-SNR performance by more than 1 dB. Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.
原文作者:Berkan Kadioglu, Michael Horgan, Xiaoyu Liu, Jordi Pons, Dan Darcy, Vivek Kumar
原文地址:http://cn.arxiv.org/abs/2002.08688