公共空间中社交机器人的稳健性改进(Sound)

  • 2020 年 1 月 20 日
  • 笔记

部署在公共空间的社交机器人由于各种各样的因素,包括20到5分贝的噪声信噪比,对ASR来说是一项艰巨的任务。现有的ASR模型在这个范围内的高信噪比下表现良好,但在噪声较大的情况下会显著降低。这项工作探索了在这种条件下提高ASR性能的方法。我们使用aishel -1中文语音语料库和Kaldi ASR工具包进行评价。我们能够在信噪比低于20db的情况下超越最先进的ASR性能,证明了使用开源工具包和通常可用的数百小时训练数据来实现相对高性能ASR的可行性。

原文题目:IMPROVED ROBUST ASR FOR SOCIAL ROBOTS IN PUBLIC SPACES

原文:Social robots deployed in public spaces present a challenging task for ASR because of a variety of factors, including noise SNR of 20 to 5 dB. Existing ASR models perform well for higher SNRs in this range, but degrade considerably with more noise. This work explores methods for providing improved ASR performance in such conditions. We use the AiShell-1 Chinese speech corpus and the Kaldi ASR toolkit for evaluations. We were able to exceed state-of-the-art ASR performance with SNR lower than 20 dB, demonstrating the feasibility of achieving relatively high performing ASR with open-source toolkits and hundreds of hours of training data, which is commonly available.

原文作者:Charles Jankowski, Vishwas Mruthyunjaya, Ruixi Lin

原文链接:https://arxiv.org/abs/2001.04619