從明顯的差異中學習可區分的感知音頻度量(Sound)
- 2020 年 1 月 20 日
- 筆記
許多音頻處理任務的評估依賴於次客觀評估,既耗時又費錢。人們已經努力創建客觀的度量標準,但現有的度量標準與人類的判斷關係不大。在這項工作中,我們通過將深度神經網絡擬合到一個新收集的數據集上來構建一個可微的度量,在這個數據集上,人們可以對一對音頻剪輯是否相同進行注釋。通過改變不同的類型,包括噪聲、混響和壓縮偽影,我們能夠了解一個與人類判斷校準良好的度量。此外,我們通過訓練一個神經網絡來評估這個度量,使用這個度量作為一個損失函數。我們發現,用我們的度量方法簡單地替換現有的損失,在去噪方面有顯著的改進,這是通過分組兩兩比較來衡量的。
原文題目:A DIFFERENTIABLE PERCEPTUAL AUDIO METRIC LEARNED FROM JUST NOTICEABLE DIFFERENCES
原文:Assessment of many audio processing tasks relies on subjective evaluation which is time-consuming and expensive. Efforts have been made to create objective metrics but exist- ing ones correlate poorly with human judgment. In this work, we construct a differentiable metric by fitting a deep neural network on a newly collected dataset of just-noticeable differences (JND), in which humans annotate whether a pair of audio clips are identical or not. By varying the type of differences, including noise, reverb, and compression artifacts, we are able to learn a metric that is well-calibrated with human judgments. Furthermore, we evaluate this metric by training a neural network, using the metric as a loss function. We find that simply replacing an existing loss with our metric yields significant improvement in denoising as measured by subjective pairwise comparison.
原文作者:Pranay Manocha, Adam Finkelstein, Zeyu Jin, Nicholas J. Bryan, Richard Zhang, Gautham J. Mysore
原文鏈接:https://arxiv.org/abs/2001.04460


