【两分钟论文】解读 OpenAI GPT-3 —— 1750亿个参数,训练费高达1200万美元,难道只是性能出色的文本生成器?

两分钟论文-社区banner.jpg

两分钟论文 | Two Minute Papers

油管原视频链接://www.youtube.com/user/keeroyz/videos

两分钟论文是原发布于油管上的一个轻松有趣的AI论文学习栏目,讲解人语速较慢,发音清晰,无论是作为听力练习还是日常学习,都是不错的材料选择。

字幕组制作了双语字幕视频,任何人都可以免费观看,同时我们欢迎大家加入AI交流群,一起分享心得~(文末二维码扫码进群)

本期内容

论文标题:Language Models are Few-Shot Learners

论文地址://arxiv.org/abs/2005.14165v2

摘要

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions – something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

最近的工作表明,通过对大量文本进行预训练,再对特定任务进行微调,许多NLP任务和基准方面取得了重大进展。尽管在架构上通常采取task-agnostic策略,但此方法仍需要有成千上万样本的,task-specific策略下的微调数据集。相比之下,人类通常只能通过几个例子或简单的指令来完成一项新的语言任务——而目前的NLP系统在很大程度上仍然难以做到这一点。在这里,我们展示了扩展语言模型可以极大地提升task-agnostic、少样本性能,有时甚至可以通过现有的最新微调方法进一步提高竞争力。具体来说,我们训练了GPT-3(一种具有1750亿个参数的自回归语言模型),这比以前的任何非稀疏语言模型都多10倍,并在尝试多次测试其性能。对于所有的任务,应用GPT-3时无需进行任何梯度更新或微调,仅通过与模型的文本交互指定任务和少量演示即可。GPT-3在许多NLP数据集上表现出了出色的性能,包括翻译、问答和完形填空,以及一些需要动态推理或领域适应的任务,如解读单词、在句子中使用新单词或执行三位数运算。同时,我们还用它识别一些数据集,其中GPT-3的少样本学习仍然很困难,尤其是与大型Web语料库训练有关的方法论问题相关的数据集。最后,我们发现GPT-3生成的新闻样本,评估人员很难将其与人类撰写的文章区分开,我们将讨论这一发现以及GPT-3的广泛社会影响。

论文讲解双语视频
(点击跳转播放地址)

微信截图_20200909180341.png

交流群

单人二维码.png

回复字幕君:“AI交流”即可