Wavesplit:通过说话者聚类实现端到端的语音分离(CS SD)

2020 年 3 月 17 日
筆記

我们介绍Wavesplit,端到端的语音分离系统。从混合语音的单一记录中，该模型推断和聚集了每个说话者的表征，然后根据推断的表征估计每个源信号。该模型根据原始波形进行训练，共同完成这两项任务。该模型通过聚类的方法推导出一组说话人表示，解决了语音分离的基本排列问题。此外，与以前的方法相比，序列范围的扬声器表示提供了更健壮的长而有挑战性的序列分离。我们证明Wavesplit在2个或3个扬声器(WSJ0-2mix、WSJ0-3mix)的清洁混合上，以及在有噪声(WHAM!)和混响(WHAMR!)的情况下，都比之前的最新技术要好。作为额外的贡献，我们通过引入在线数据增强来进一步改进我们的模型。

原文题目：Wavesplit: End-to-End Speech Separation by Speaker Clustering

原文：We introduce Wavesplit, an end-to-end speech separation system. From a single recording of mixed speech, the model infers and clusters representations of each speaker and then estimates each source signal conditioned on the inferred representations. The model is trained on the raw waveform to jointly perform the two tasks. Our model infers a set of speaker representations through clustering, which addresses the fundamental permutation problem of speech separation. Moreover, the sequence-wide speaker representations provide a more robust separation of long, challenging sequences, compared to previous approaches. We show that Wavesplit outperforms the previous state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2mix, WSJ0-3mix), as well as in noisy (WHAM!) and reverberated (WHAMR!) conditions. As an additional contribution, we further improve our model by introducing online data augmentation for separation.

原文作者:Neil Zeghidour, David Grangier 原文地址：http://cn.arxiv.org/abs/2002.08933

Wavesplit 通过说话者聚类实现端到端的语音分离(CS SD).pdf

Wavesplit:通过说话者聚类实现端到端的语音分离(CS SD)

VirMach 便宜 VPS

QNews

Wavesplit:通过说话者聚类实现端到端的语音分离(CS SD)

分享此文：

Related Posts

每日三道面试题，通往自由的道路6——JVM

Fetch API与POST请求参数格式那些事

输入:通过输入和动态规划的序列建模（CS SD）

使用基于深度学习的暹罗结构和成对存在矩阵的多标签声音事件检索（CS SD）

VirMach 便宜 VPS

QNews

熱門搜尋