从明显的差异中学习可区分的感知音频度量(Sound)
- 2020 年 1 月 20 日
- 笔记
许多音频处理任务的评估依赖于次客观评估,既耗时又费钱。人们已经努力创建客观的度量标准,但现有的度量标准与人类的判断关系不大。在这项工作中,我们通过将深度神经网络拟合到一个新收集的数据集上来构建一个可微的度量,在这个数据集上,人们可以对一对音频剪辑是否相同进行注释。通过改变不同的类型,包括噪声、混响和压缩伪影,我们能够了解一个与人类判断校准良好的度量。此外,我们通过训练一个神经网络来评估这个度量,使用这个度量作为一个损失函数。我们发现,用我们的度量方法简单地替换现有的损失,在去噪方面有显著的改进,这是通过分组两两比较来衡量的。
原文题目:A DIFFERENTIABLE PERCEPTUAL AUDIO METRIC LEARNED FROM JUST NOTICEABLE DIFFERENCES
原文:Assessment of many audio processing tasks relies on subjective evaluation which is time-consuming and expensive. Efforts have been made to create objective metrics but exist- ing ones correlate poorly with human judgment. In this work, we construct a differentiable metric by fitting a deep neural network on a newly collected dataset of just-noticeable differences (JND), in which humans annotate whether a pair of audio clips are identical or not. By varying the type of differences, including noise, reverb, and compression artifacts, we are able to learn a metric that is well-calibrated with human judgments. Furthermore, we evaluate this metric by training a neural network, using the metric as a loss function. We find that simply replacing an existing loss with our metric yields significant improvement in denoising as measured by subjective pairwise comparison.
原文作者:Pranay Manocha, Adam Finkelstein, Zeyu Jin, Nicholas J. Bryan, Richard Zhang, Gautham J. Mysore
原文链接:https://arxiv.org/abs/2001.04460