乐谱和歌词-自由歌唱的声音产生(Sounds)

歌声的生成模型多关注“歌唱声合成”的任务,即“歌声合成”。比如,以产生唱歌的声音波形给予乐谱和文字歌词。在这项工作中,我们探索了一个新颖又富有挑战性的选择:歌唱声在训练和推断的时间里、在没有预先指定的乐谱和歌词的情况下生成。特别地,我们提出了三种非条件或弱条件的声乐发声方案。我们讲述了相关的挑战并提出了解决这些新任务的途径。这包括开发用于数据准备的源分离和转录模型,用于音频生成的对抗网络,以及用于评估的定制指标。

原文标题:Sounds: Score and Lyrics-Free Singing Voice Generation

原文:Generative models for singing voice have been mostly concerned with the task of "singing voice synthesis," i.e., to produce singing voice waveforms given musical scores and text lyrics. In this work, we explore a novel yet challenging alternative: singing voice generation without pre-assigned scores and lyrics, in both training and inference time. In particular, we propose three either unconditioned or weakly conditioned singing voice generation schemes. We outline the associated challenges and propose a pipeline to tackle these new tasks. This involves the development of source separation and transcription models for data preparation, adversarial networks for audio generation, and customized metrics for evaluation.

原文作者:Jen-Yu Liu,Yu-Hua Chen,Yin-Cheng Yeh,Yi-Hsuan Yang

原文链接:https://arxiv.org/abs/1912.11747