Wavesplit:通過說話者聚類實現端到端的語音分離(CS SD)
- 2020 年 3 月 17 日
- 筆記
我們介紹Wavesplit,端到端的語音分離系統。從混合語音的單一記錄中,該模型推斷和聚集了每個說話者的表徵,然後根據推斷的表徵估計每個源訊號。該模型根據原始波形進行訓練,共同完成這兩項任務。該模型通過聚類的方法推導出一組說話人表示,解決了語音分離的基本排列問題。此外,與以前的方法相比,序列範圍的揚聲器表示提供了更健壯的長而有挑戰性的序列分離。我們證明Wavesplit在2個或3個揚聲器(WSJ0-2mix、WSJ0-3mix)的清潔混合上,以及在有雜訊(WHAM!)和混響(WHAMR!)的情況下,都比之前的最新技術要好。作為額外的貢獻,我們通過引入在線數據增強來進一步改進我們的模型。
原文題目:Wavesplit: End-to-End Speech Separation by Speaker Clustering
原文:We introduce Wavesplit, an end-to-end speech separation system. From a single recording of mixed speech, the model infers and clusters representations of each speaker and then estimates each source signal conditioned on the inferred representations. The model is trained on the raw waveform to jointly perform the two tasks. Our model infers a set of speaker representations through clustering, which addresses the fundamental permutation problem of speech separation. Moreover, the sequence-wide speaker representations provide a more robust separation of long, challenging sequences, compared to previous approaches. We show that Wavesplit outperforms the previous state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2mix, WSJ0-3mix), as well as in noisy (WHAM!) and reverberated (WHAMR!) conditions. As an additional contribution, we further improve our model by introducing online data augmentation for separation.
原文作者:Neil Zeghidour, David Grangier 原文地址:http://cn.arxiv.org/abs/2002.08933