使用基於深度學習的暹羅結構和成對存在矩陣的多標籤聲音事件檢索(CS SD)
- 2020 年 3 月 17 日
- 筆記
真實的音景錄音通常有多個聲音事件同時發生,如汽車喇叭、引擎和人聲。聲音事件檢索是一種基於內容的搜索,目的是找到音頻樣本,類似於基於聲音或語義內容的音頻查詢。目前的聲音事件檢索模型主要關注單標籤的音頻記錄,只發生一個聲音事件,而不是多標籤的音頻記錄。,在一個錄音中會出現多個聲音事件)。為了解決後一個問題,我們提出了具有暹羅結構和成對存在矩陣的不同深度學習架構。使用包含單標記和多標記音景錄音的SONYC-UST數據集對網路進行訓練和評估。模擬結果表明了該模型的有效性。
原文題目:Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix 原文:Realistic recordings of soundscapes often have multiple sound events co-occurring, such as car horns, engine and human voices. Sound event retrieval is a type of content-based search aiming at finding audio samples, similar to an audio query based on their acoustic or semantic content. State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound event occurring, rather than on multi-label audio recordings (i.e., multiple sound events occur in one recording). To address this latter problem, we propose different Deep Learning architectures with a Siamese-structure and a Pairwise Presence Matrix. The networks are trained and evaluated using the SONYC-UST dataset containing both single- and multi-label soundscape recordings. The performance results show the effectiveness of our proposed model.
原文作者:Jianyu Fan, Eric Nichols, Daniel Tompkins, Ana Elisa Mendez Mendez, Benjamin Elizalde, Philippe Pasquier
原文地址:http://cn.arxiv.org/abs/2002.09026