基於語言模型調節和位置建模為的摘要式文本摘要
- 2020 年 4 月 3 日
- 筆記
我們對掌握多少預訓練語言模型相關知識才能有利於執行摘要式摘要任務方面展開了研究。為此,實驗時,我們在BERT語言模型的網路模型基礎上調節了轉換器的編碼器和解碼器。此外,我們提出了一種新的BERT窗口化方法,利用這種方法可以按位元組讀取順序處理比BERT窗口更長的文本。我們還對位置建模 – 本地文本計算的明確限制 – 如何影響轉換器的摘要能力方面進行了研究。研究時,我們在編碼器的首層中引入了二維卷積自我注意。然後,將各模型結果與基準線以及CNN/每日郵報數據集上的現代化模型結果進行對比。為了證明該方法同樣適用於德語,我們還在SwissText數據集上對我們的模型進行了相關訓練。對比結果顯示,我們模型在兩個數據集上的ROUGE分值均高於基準線值,這就表明這些模型非常適合用於手動定性分析。
原文標題:Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
We explore to what extent knowledge about the pre-trained language model that is used is beneficial for the task of abstractive summarization. To this end, we experiment with conditioning the encoder and decoder of a Transformer-based neural model on the BERT language model. In addition, we propose a new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size. We also explore how locality modelling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer. This is done by introducing 2-dimensional convolutional self-attention into the first layers of the encoder. The results of our models are compared to a baseline and the state-of-the-art models on the CNN/Daily Mail dataset. We additionally train our model on the SwissText dataset to demonstrate usability on German. Both models outperform the baseline in ROUGE scores on two datasets and show its superiority in a manual qualitative analysis.
原文作者:Dmitrii Aksenov, Julián Moreno-Schneider, Peter Bourgonje, Robert Schwarzenberg, Leonhard Hennig, Georg Rehm
原文:https://arxiv.org/abs/2003.13027