芬蘭語建模與深層變壓器模型(CS SD)

  • 2020 年 3 月 27 日
  • 筆記

在LSTM被認為是主導模型體系結構之後的很長一段時間,轉換器在語言建模中佔據了中心舞台。在這個課題中,我們研究了BRET轉換器結構和XL轉換器結構在語言建模任務中的性能。BERT獲得了14.5的偽複雜度評分,這是我們目前所知道的第一個此類的測量。XL模型的偽複雜度分數提高到73.58,比LSTM模型提高了27%。

原文題目:Finnish Language Modeling with Deep Transformer Models

原文:Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the language modeling task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27% better than the LSTM model.

原文作者:Abhilash Jain

原文地址:https://arxiv.org/abs/2003.11562