芬兰语建模与深层变压器模型(CS SD)
- 2020 年 3 月 27 日
- 筆記
在LSTM被认为是主导模型体系结构之后的很长一段时间,转换器在语言建模中占据了中心舞台。在这个课题中,我们研究了BRET转换器结构和XL转换器结构在语言建模任务中的性能。BERT获得了14.5的伪复杂度评分,这是我们目前所知道的第一个此类的测量。XL模型的伪复杂度分数提高到73.58,比LSTM模型提高了27%。
原文题目:Finnish Language Modeling with Deep Transformer Models
原文:Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the language modeling task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27% better than the LSTM model.
原文作者:Abhilash Jain
原文地址:https://arxiv.org/abs/2003.11562