具有语言偏差的传感器,用于端到端普通话-英语语码转换语音识别(CS SD)

  • 2020 年 3 月 17 日
  • 筆記

近年来,语言识别信息被用于提高端到端码转换语音识别的性能。然而,以往的工作都是使用额外的语言识别(LID)模型作为辅助模块,这导致了系统的复杂性。在这项工作中,我们提出了一个改进的带有语言偏差的递归神经网络传感器(RNN-T)模型来缓解这个问题。我们使用语言身份来偏见模型来预测CS点。这促进了该模型直接从转录中学习语言身份信息,不需要额外的LID模型。在汉英语料库SEAME上对该方法进行了评价。与我们的RNN-T基线相比,该方法可以在两个测试集上分别实现16.2%和12.9%的相对误差降低。

原文题目:Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition

原文:Recently, language identity information has been utilized to improve the performance of end-to-end code-switching (CS) speech recognition. However, previous works use an additional language identification (LID) model as an auxiliary module, which causes the system complex. In this work, we propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem. We use the language identities to bias the model to predict the CS points. This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed. We evaluate the approach on a Mandarin-English CS corpus SEAME. Compared to our RNN-T baseline, the proposed method can achieve 16.2% and 12.9% relative error reduction on two test sets, respectively.

原文作者:Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Ye Bai 原文地址:http://cn.arxiv.org/abs/2002.08126