使用基于深度学习的暹罗结构和成对存在矩阵的多标签声音事件检索(CS SD)

  • 2020 年 3 月 17 日
  • 笔记

真实的音景录音通常有多个声音事件同时发生,如汽车喇叭、引擎和人声。声音事件检索是一种基于内容的搜索,目的是找到音频样本,类似于基于声音或语义内容的音频查询。目前的声音事件检索模型主要关注单标签的音频记录,只发生一个声音事件,而不是多标签的音频记录。,在一个录音中会出现多个声音事件)。为了解决后一个问题,我们提出了具有暹罗结构和成对存在矩阵的不同深度学习架构。使用包含单标记和多标记音景录音的SONYC-UST数据集对网络进行训练和评估。仿真结果表明了该模型的有效性。

原文题目:Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix 原文:Realistic recordings of soundscapes often have multiple sound events co-occurring, such as car horns, engine and human voices. Sound event retrieval is a type of content-based search aiming at finding audio samples, similar to an audio query based on their acoustic or semantic content. State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound event occurring, rather than on multi-label audio recordings (i.e., multiple sound events occur in one recording). To address this latter problem, we propose different Deep Learning architectures with a Siamese-structure and a Pairwise Presence Matrix. The networks are trained and evaluated using the SONYC-UST dataset containing both single- and multi-label soundscape recordings. The performance results show the effectiveness of our proposed model.

原文作者:Jianyu Fan, Eric Nichols, Daniel Tompkins, Ana Elisa Mendez Mendez, Benjamin Elizalde, Philippe Pasquier

原文地址:http://cn.arxiv.org/abs/2002.09026