芬兰语建模与深层变压器模型(CS SD)

  • 2020 年 3 月 27 日
  • 筆記

在LSTM被认为是主导模型体系结构之后的很长一段时间,转换器在语言建模中占据了中心舞台。在这个课题中,我们研究了BRET转换器结构和XL转换器结构在语言建模任务中的性能。BERT获得了14.5的伪复杂度评分,这是我们目前所知道的第一个此类的测量。XL模型的伪复杂度分数提高到73.58,比LSTM模型提高了27%。

原文题目:Finnish Language Modeling with Deep Transformer Models

原文:Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the language modeling task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27% better than the LSTM model.

原文作者:Abhilash Jain

原文地址:https://arxiv.org/abs/2003.11562