公共空間中社交機械人的穩健性改進(Sound)
- 2020 年 1 月 20 日
- 筆記
部署在公共空間的社交機械人由於各種各樣的因素,包括20到5分貝的噪聲信噪比,對ASR來說是一項艱巨的任務。現有的ASR模型在這個範圍內的高信噪比下表現良好,但在噪聲較大的情況下會顯著降低。這項工作探索了在這種條件下提高ASR性能的方法。我們使用aishel -1中文語音語料庫和Kaldi ASR工具包進行評價。我們能夠在信噪比低於20db的情況下超越最先進的ASR性能,證明了使用開源工具包和通常可用的數百小時訓練數據來實現相對高性能ASR的可行性。
原文題目:IMPROVED ROBUST ASR FOR SOCIAL ROBOTS IN PUBLIC SPACES
原文:Social robots deployed in public spaces present a challenging task for ASR because of a variety of factors, including noise SNR of 20 to 5 dB. Existing ASR models perform well for higher SNRs in this range, but degrade considerably with more noise. This work explores methods for providing improved ASR performance in such conditions. We use the AiShell-1 Chinese speech corpus and the Kaldi ASR toolkit for evaluations. We were able to exceed state-of-the-art ASR performance with SNR lower than 20 dB, demonstrating the feasibility of achieving relatively high performing ASR with open-source toolkits and hundreds of hours of training data, which is commonly available.
原文作者:Charles Jankowski, Vishwas Mruthyunjaya, Ruixi Lin
原文鏈接:https://arxiv.org/abs/2001.04619