高效、可扩展的神经残差波形编码与协同量化(Multimedia)

  • 2020 年 2 月 15 日
  • 笔记

神经语音编解码器需要可扩展性和效率,它支持各种设备上应用程序的广泛比特率。我们提出了一种协同量化(CQ)方案来共同学习LPC系数的码本和相应的残数。协同量化并不是简单地将LPC硬塞到神经网络中,而是将先进的神经网络模型的计算能力与传统的、高效的、特定领域的数字信号处理方法以一种综合的方式连接起来。我们证明了协同量化在模型复杂度更低的情况下,以9kbps的速度获得了比上一代更高的质量。我们还表明,协同量化可以扩展到24kbps,在这方面它比AMR-WB和Opus更出色。作为一种神经波形编解码器,协同量化模型的参数小于100万个,明显小于许多其他生成模型。

原文题目:EFFICIENT AND SCALABLE NEURAL RESIDUAL WAVEFORM CODING WITH COLLABORATIVE QUANTIZATION

原文:Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. CQ does not simply shoehorn LPC to a neural network, but bridges the computational capacity of advanced neural network models and traditional, yet efficient and domain-specific digital signal processing methods in an integrated manner. We demonstrate that CQ achieves much higher quality than its predecessor at 9 kbps with even lower model complexity. We also show that CQ can scale up to 24 kbps where it outperforms AMR-WB and Opus. As a neural waveform codec, CQ models are with less than 1 million parameters, significantly less than many other generative models.

原文作者:Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim

原文链接:https://arxiv.org/abs/2002.05604