利用自我注意卷積神經網絡實現音樂中的語音和伴奏分離(CS SD)
- 2020 年 3 月 27 日
- 筆記
幾十年來,音樂聲源分離一直是信號處理領域的一個熱門課題,不僅因為其技術難度大,而且由於其在許多商業應用中的重要性,如自動伴音和重混音等。本文提出了一種新穎的自注意網絡,將音樂中的聲樂與伴奏分離開來。首先,構建一個具有緊密連接的CNN塊的卷積神經網絡(convolutional neural network, CNN)作為我們的基網絡。然後,我們在基礎CNN的不同層次插入自我注意子網,以利用音樂的長期內依賴,即重複性。在自我注意子網絡中,同樣的音樂模式的重複可以重建其他的重複,以獲得更好的音源分離性能。結果表明,該方法使聲分離的SDR相對提高了19.5%。我們也將我們的方法與先進的MMDenseNet和MMDenseLSTM系統進行了比較。
原文題目:Voice and accompaniment separation in music using self-attention convolutional neural network
原文:Music source separation has been a popular topic in signal processing for decades, not only because of its technical difficulty, but also due to its importance to many commercial applications, such as automatic karoake and remixing. In this work, we propose a novel self-attention network to separate voice and accompaniment in music. First, a convolutional neural network (CNN) with densely-connected CNN blocks is built as our base network. We then insert self-attention subnets at different levels of the base CNN to make use of the long-term intra-dependency of music, i.e., repetition. Within self-attention subnets, repetitions of the same musical patterns inform reconstruction of other repetitions, for better source separation performance. Results show the proposed method leads to 19.5% relative improvement in vocals separation in terms of SDR. We compare our methods with state-of-the-art systems i.e. MMDenseNet and MMDenseLSTM.
原文作者:Yuzhou Liu (1), Balaji Thoshkahna (2), Ali Milani (3), Trausti Kristjansson (3) ((1) Ohio State University (2) Amazon Music, Bangalore (3) Amazon Lab126, CA)
原文地址:https://arxiv.org/abs/2003.08954