Wavesplit:通過說話者聚類實現端到端的語音分離(CS SD)

2020 年 3 月 17 日
筆記

我們介紹Wavesplit,端到端的語音分離系統。從混合語音的單一記錄中，該模型推斷和聚集了每個說話者的表徵，然後根據推斷的表徵估計每個源訊號。該模型根據原始波形進行訓練，共同完成這兩項任務。該模型通過聚類的方法推導出一組說話人表示，解決了語音分離的基本排列問題。此外，與以前的方法相比，序列範圍的揚聲器表示提供了更健壯的長而有挑戰性的序列分離。我們證明Wavesplit在2個或3個揚聲器(WSJ0-2mix、WSJ0-3mix)的清潔混合上，以及在有雜訊(WHAM!)和混響(WHAMR!)的情況下，都比之前的最新技術要好。作為額外的貢獻，我們通過引入在線數據增強來進一步改進我們的模型。

原文題目：Wavesplit: End-to-End Speech Separation by Speaker Clustering

原文：We introduce Wavesplit, an end-to-end speech separation system. From a single recording of mixed speech, the model infers and clusters representations of each speaker and then estimates each source signal conditioned on the inferred representations. The model is trained on the raw waveform to jointly perform the two tasks. Our model infers a set of speaker representations through clustering, which addresses the fundamental permutation problem of speech separation. Moreover, the sequence-wide speaker representations provide a more robust separation of long, challenging sequences, compared to previous approaches. We show that Wavesplit outperforms the previous state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2mix, WSJ0-3mix), as well as in noisy (WHAM!) and reverberated (WHAMR!) conditions. As an additional contribution, we further improve our model by introducing online data augmentation for separation.

原文作者:Neil Zeghidour, David Grangier 原文地址：http://cn.arxiv.org/abs/2002.08933

Wavesplit 通過說話者聚類實現端到端的語音分離(CS SD).pdf

Wavesplit:通過說話者聚類實現端到端的語音分離(CS SD)

VirMach 便宜 VPS

QNews

Wavesplit:通過說話者聚類實現端到端的語音分離(CS SD)

分享此文：

Related Posts

記一次webpack構建提速

機器學習，詳解SVM軟間隔與對偶問題

輸入:通過輸入和動態規劃的序列建模（CS SD）

使用基於深度學習的暹羅結構和成對存在矩陣的多標籤聲音事件檢索（CS SD）

VirMach 便宜 VPS

QNews

熱門搜尋