16篇EMNLP2020論文集合

微信圖片_20201120163300.jpg

作為國際語言學會(ACL)下屬的 SIGDAT 小組主辦的自然語言處理領域的頂級國際會議。EMNLP每年舉辦一次,去年則與 IJCNLP 聯合在香港舉辦,今年由於疫情轉為線上舉辦。

EMNLP 2020 共收到3677篇投稿,有效投稿為3359 篇,總接收論文752 篇,包括602篇長論文、150篇短論文。

從接收率看,EMNLP 2020的論文接受率創下近五年新低,為22.4%,其中長論文接收率為 24.6%,短論文接收率為16.6%。

助助整理了一些EMNLP2020最新接收的論文,有需要的同學可以查看下載。如果大家有其他覺得不錯的論文,歡迎上傳到AI研習社論文版塊,幫助更多熱愛AI,願意為AI產業貢獻智慧的朋友,一起學習成長。


【1】Deconstructing word embedding algorithms

標題:解構詞嵌入演算法

作者:Kian Kenyon-Dean,Edward Newell,Jackie Chi Kit Cheung

備註:EMNLP 2020, 6 pages. arXiv admin note: substantial text overlap with arXiv:1911.13280

鏈接//arxiv.org/pdf/2011.07013v1

摘要:Word embeddings are reliable feature representations of words used to obtainhigh quality results for various NLP applications. Uncontextualized wordembeddings are used in many NLP tasks today, especially in resource-limitedsettings where high memory capacity and GPUs are not available. Given thehistorical success of word embeddings in NLP, we propose a retrospective onsome of the most well-known word embedding algorithms. In this work, wedeconstruct Word2vec, GloVe, and others, into a common form, unveiling some ofthe common conditions that seem to be required for making performant wordembeddings. We believe that the theoretical findings in this paper can providea basis for more informed development of future models.


【2】 RethinkCWS: Is Chinese Word Segmentation a Solved Task?

標題:反思CWS:中文分詞是一項已解決的任務嗎?

作者:Jinlan Fu,Pengfei Liu,Qi Zhang,Xuanjing Huang

備註:Accepted by EMNLP 2020

鏈接//arxiv.org/pdf/2011.06858v1

摘要:The performance of the Chinese Word Segmentation (CWS) systems has graduallyreached a plateau with the rapid development of deep neural networks,especially the successful use of large pre-trained models. In this paper, wetake stock of what we have achieved and rethink what’s left in the CWS task.Methodologically, we propose a fine-grained evaluation for existing CWSsystems, which not only allows us to diagnose the strengths and weaknesses ofexisting models (under the in-dataset setting), but enables us to quantify thediscrepancy between different criterion and alleviate the negative transferproblem when doing multi-criteria learning. Strategically, despite not aimingto propose a novel model in this paper, our comprehensive experiments on eightmodels and seven datasets, as well as thorough analysis, could search for somepromising direction for future research. We make all codes publicly availableand release an interface that can quickly evaluate and diagnose user’s models:https:github.comneulabInterpretEval.


【3】 Interpretable Multi-dataset Evaluation for Named Entity Recognition

標題:命名實體識別中的可解釋多數據集評價

作者:Jinlan Fu,Pengfei Liu,Graham Neubig

備註:Accepted by EMNLP 2020

鏈接//arxiv.org/pdf/2011.06854v1

摘要:With the proliferation of models for natural language processing tasks, it iseven harder to understand the differences between models and their relativemerits. Simply looking at differences between holistic metrics such asaccuracy, BLEU, or F1 does not tell us why or how particular methods performdifferently and how diverse datasets influence the model design choices. Inthis paper, we present a general methodology for interpretable evaluation forthe named entity recognition (NER) task. The proposed evaluation method enablesus to interpret the differences in models and datasets, as well as theinterplay between them, identifying the strengths and weaknesses of currentsystems. By making our analysis tool available, we make it easy for futureresearchers to run similar analyses and drive progress in this area:https:github.comneulabInterpretEval.


【4】 diagNNose: A Library for Neural Activation Analysis

標題:DiagNNose:一個神經激活分析庫

作者:Jaap Jumelet

備註:Accepted to the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, EMNLP 2020

鏈接//arxiv.org/pdf/2011.06819v1

摘要:In this paper we introduce diagNNose, an open source library for analysingthe activations of deep neural networks. diagNNose contains a wide array ofinterpretability techniques that provide fundamental insights into the innerworkings of neural networks. We demonstrate the functionality of diagNNose witha case study on subject-verb agreement within language models. diagNNose isavailable at https:github.comi-machine-thinkdiagnnose.


【5】 Context-aware Stand-alone Neural Spelling Correction

標題:上下文感知的獨立神經拼寫更正

作者:Xiangci Li,Hairong Liu,Liang Huang

備註:8 pages, 5 tables, 1 figure. Findings of the Association for Computational Linguistics: EMNLP 2020

鏈接//arxiv.org/pdf/2011.06642v1

摘要:Existing natural language processing systems are vulnerable to noisy inputsresulting from misspellings. On the contrary, humans can easily infer thecorresponding correct words from their misspellings and surrounding context.Inspired by this, we address the stand-alone spelling correction problem, whichonly corrects the spelling of each token without additional token insertion ordeletion, by utilizing both spelling information and global contextrepresentations. We present a simple yet powerful solution that jointly detectsand corrects misspellings as a sequence labeling task by fine-turning apre-trained language model. Our solution outperforms the previousstate-of-the-art result by 12.8% absolute F0.5 score.


【6】 doc2dial: A Goal-Oriented Document-Grounded Dialogue Dataset

標題:doc2ial:一個面向目標的基於文檔的對話數據集

作者:Song Feng,Hui Wan,Chulaka Gunasekara,Siva Sankalp Patel,Sachindra Joshi,Luis A. Lastras

備註:EMNLP 2020

鏈接//arxiv.org/pdf/2011.06623v1

摘要:We introduce doc2dial, a new dataset of goal-oriented dialogues that aregrounded in the associated documents. Inspired by how the authors composedocuments for guiding end users, we first construct dialogue flows based on thecontent elements that corresponds to higher-level relations across textsections as well as lower-level relations between discourse units within asection. Then we present these dialogue flows to crowd contributors to createconversational utterances. The dataset includes over 4500 annotatedconversations with an average of 14 turns that are grounded in over 450documents from four domains. Compared to the prior document-grounded dialoguedatasets, this dataset covers a variety of dialogue scenes ininformation-seeking conversations. For evaluating the versatility of thedataset, we introduce multiple dialogue modeling tasks and present baselineapproaches.


【7】 Learning from Task Descriptions

標題:從任務描述中學習

作者:Orion Weller,Nicholas Lourie,Matt Gardner,Matthew E. Peters

備註:EMNLP 2020

鏈接//arxiv.org/pdf/2011.08115v1

摘要:Typically, machine learning systems solve new tasks by training on thousandsof examples. In contrast, humans can solve new tasks by reading someinstructions, with perhaps an example or two. To take a step toward closingthis gap, we introduce a framework for developing NLP systems that solve newtasks after reading their descriptions, synthesizing prior work in this area.We instantiate this framework with a new English language dataset, ZEST,structured for task-oriented evaluation on unseen tasks. Formulating taskdescriptions as questions, we ensure each is general enough to apply to manypossible inputs, thus comprehensively evaluating a model’s ability to solveeach task. Moreover, the dataset’s structure tests specific types of systematicgeneralization. We find that the state-of-the-art T5 model achieves a score of12% on ZEST, leaving a significant challenge for NLP researchers.


【8】 A Dataset for Tracking Entities in Open Domain Procedural Text

標題:用於跟蹤開放領域過程文本中的實體的數據集

作者:Niket Tandon,Keisuke Sakaguchi,Bhavana Dalvi Mishra,Dheeraj Rajagopal,Peter Clark,Michal Guerquin,Kyle Richardson,Eduard Hovy

備註:To appear in EMNLP 2020

鏈接//arxiv.org/pdf/2011.08092v1

摘要:We present the first dataset for tracking state changes in procedural textfrom arbitrary domains by using an unrestricted (open) vocabulary. For example,in a text describing fog removal using potatoes, a car window may transitionbetween being foggy, sticky,opaque, and clear. Previous formulations of thistask provide the text and entities involved,and ask how those entities changefor just a small, pre-defined set of attributes (e.g., location), limitingtheir fidelity. Our solution is a new task formulation where given just aprocedural text as input, the task is to generate a set of state changetuples(entity, at-tribute, before-state, after-state)for each step,where theentity, attribute, and state values must be predicted from an open vocabulary.Using crowdsourcing, we create OPENPI1, a high-quality (91.5% coverage asjudged by humans and completely vetted), and large-scale dataset comprising29,928 state changes over 4,050 sentences from 810 procedural real-worldparagraphs from WikiHow.com. A current state-of-the-art generation model onthis task achieves 16.1% F1 based on BLEU metric, leaving enough room for novelmodel architectures.


【9】 ArraMon: A Joint Navigation-Assembly Instruction Interpretation Task in Dynamic Environments

標題:ArraMon:一種動態環境下的聯合導航-彙編指令解釋任務

作者:Hyounghun Kim,Abhay Zala,Graham Burri,Hao Tan,Mohit Bansal

備註:EMNLP Findings 2020 (18 pages; extended to Hindi)

鏈接:(//arxiv.org/pdf/2011.07660v1

摘要:For embodied agents, navigation is an important ability but not an isolatedgoal. Agents are also expected to perform specific tasks after reaching thetarget location, such as picking up objects and assembling them into aparticular arrangement. We combine Vision-and-Language Navigation, assemblingof collected objects, and object referring expression comprehension, to createa novel joint navigation-and-assembly task, named ArraMon. During this task,the agent (similar to a PokeMON GO player) is asked to find and collectdifferent target objects one-by-one by navigating based on natural languageinstructions in a complex, realistic outdoor environment, but then also ARRAngethe collected objects part-by-part in an egocentric grid-layout environment. Tosupport this task, we implement a 3D dynamic environment simulator and collecta dataset (in English; and also extended to Hindi) with human-writtennavigation and assembling instructions, and the corresponding ground truthtrajectories. We also filter the collected instructions via a verificationstage, leading to a total of 7.7K task instances (30.8K instructions andpaths). We present results for several baseline models (integrated and biased)and metrics (nDTW, CTC, rPOD, and PTC), and the large model-human performancegap demonstrates that our task is challenging and presents a wide scope forfuture work. Our dataset, simulator, and code are publicly available at:https:arramonunc.github.io


【10】 DORB: Dynamically Optimizing Multiple Rewards with Bandits

標題:DORB:用強盜動態優化多重獎勵

作者:Ramakanth Pasunuru,Han Guo,Mohit Bansal

備註:EMNLP 2020 (15 pages)

鏈接//arxiv.org/pdf/2011.07635v1

摘要:Policy gradients-based reinforcement learning has proven to be a promisingapproach for directly optimizing non-differentiable evaluation metrics forlanguage generation tasks. However, optimizing for a specific metric rewardleads to improvements in mostly that metric only, suggesting that the model isgaming the formulation of that metric in a particular way without oftenachieving real qualitative improvements. Hence, it is more beneficial to makethe model optimize multiple diverse metric rewards jointly. While appealing,this is challenging because one needs to manually decide the importance andscaling weights of these metric rewards. Further, it is important to considerusing a dynamic combination and curriculum of metric rewards that flexiblychanges over time. Considering the above aspects, in our work, we automate theoptimization of multiple metric rewards simultaneously via a multi-armed banditapproach (DORB), where at each round, the bandit chooses which metric reward tooptimize next, based on expected arm gains. We use the Exp3 algorithm forbandits and formulate two approaches for bandit rewards: (1) SingleMulti-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit(HM-Bandit). We empirically show the effectiveness of our approaches viavarious automatic metrics and human evaluation on two important NLG tasks:question generation and data-to-text generation, including on an unseen-testtransfer setup. Finally, we present interpretable analyses of the learnedbandit curriculum over the optimized rewards.


【11】 IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

標題:IIRC:一個不完全資訊閱讀理解題的數據集

作者:James Ferguson,Matt Gardner,Hannaneh Hajishirzi,Tushar Khot,Pradeep Dasigi

備註:EMNLP 2020

鏈接//arxiv.org/pdf/2011.07127v1

摘要:Humans often have to read multiple documents to address their informationneeds. However, most existing reading comprehension (RC) tasks only focus onquestions for which the contexts provide all the information required to answerthem, thus not evaluating a system’s performance at identifying a potentiallack of sufficient information and locating sources for that information. Tofill this gap, we present a dataset, IIRC, with more than 13K questions overparagraphs from English Wikipedia that provide only partial information toanswer them, with the missing information occurring in one or more linkeddocuments. The questions were written by crowd workers who did not have accessto any of the linked documents, leading to questions that have little lexicaloverlap with the contexts where the answers appear. This process also gave manyquestions without answers, and those that require discrete reasoning,increasing the difficulty of the task. We follow recent modeling work onvarious reading comprehension datasets to construct a baseline model for thisdataset, finding that it achieves 31.1% F1 on this task, while estimated humanperformance is 88.4%. The dataset, code for the baseline system, and aleaderboard can be found at https:allennlp.orgiirc.


【12】 Structural and Functional Decomposition for Personality Image Captioning in a Communication Game

標題:通訊遊戲中人物影像字幕的結構和功能分解

作者:Thu Nguyen,Duy Phung,Minh Hoai,Thien Huu Nguyen

機構*:VinAI Research, Vietnam, University of Information Technology, VNU-HCM, Vietnam, Stony Brook University, Stony Brook,ny,usa, University of Oregon, Eugene, OR , USA

備註:10 pages, EMNLP-Findings 2020

Journal-ref:EMNLP-Findings 2020

鏈接//arxiv.org/pdf/2011.08543v1

摘要:Personality image captioning (PIC) aims to describe an image with a naturallanguage caption given a personality trait. In this work, we introduce a novelformulation for PIC based on a communication game between a speaker and alistener. The speaker attempts to generate natural language captions while thelistener encourages the generated captions to contain discriminativeinformation about the input images and personality traits. In this way, weexpect that the generated captions can be improved to naturally represent theimages and express the traits. In addition, we propose to adapt the languagemodel GPT2 to perform caption generation for PIC. This enables the speaker andlistener to benefit from the language encoding capacity of GPT2. Ourexperiments show that the proposed model achieves the state-of-the-artperformance for PIC.


【13】 Where Are You? Localization from Embodied Dialog

標題:你在哪?從具體化對話框進行本地化

作者:Meera Hahn,Jacob Krantz,Dhruv Batra,Devi Parikh,James M. Rehg,Stefan Lee,Peter Anderson

機構*:Georgia Institute of Technology ,Oregon State University ,Facebook AI Research(FAIR)

Journal-ref:EMNLP 2020

鏈接//arxiv.org/pdf/2011.08277v1

摘要:We present Where Are You? (WAY), a dataset of 6k dialogs in which two humans– an Observer and a Locator — complete a cooperative localization task. TheObserver is spawned at random in a 3D environment and can navigate fromfirst-person views while answering questions from the Locator. The Locator mustlocalize the Observer in a detailed top-down map by asking questions and givinginstructions. Based on this dataset, we define three challenging tasks:Localization from Embodied Dialog or LED (localizing the Observer from dialoghistory), Embodied Visual Dialog (modeling the Observer), and CooperativeLocalization (modeling both agents). In this paper, we focus on the LED task –providing a strong baseline model with detailed ablations characterizing bothdataset biases and the importance of various modeling choices. Our best modelachieves 32.7% success at identifying the Observer’s location within 3m inunseen buildings, vs. 70.4% for human Locators.

【14】 Sequence-Level Mixed Sample Data Augmentation

標題:序列級混合樣本數據增強

作者:Demi Guo,Yoon Kim,Alexander M. Rush

機構*:Harvard University, MIT-IBM Watson AI Lab, Cornell University

備註:EMNLP 2020

鏈接//arxiv.org/pdf/2011.09039v1

摘要:Despite their empirical success, neural networks still have difficultycapturing compositional aspects of natural language. This work proposes asimple data augmentation approach to encourage compositional behavior in neuralmodels for sequence-to-sequence problems. Our approach, SeqMix, creates newsynthetic examples by softly combining inputoutput sequences from the trainingset. We connect this approach to existing techniques such as SwitchOut and worddropout, and show that these techniques are all approximating variants of asingle objective. SeqMix consistently yields approximately 1.0 BLEU improvementon five different translation datasets over strong Transformer baselines. Ontasks that require strong compositional generalization such as SCAN andsemantic parsing, SeqMix also offers further improvements.

【15】 Relation Extraction with Contextualized Relation Embedding (CRE)

標題:基於上下文關係嵌入(CRE)的關係抽取

作者:Xiaoyu Chen,Rohan Badlani

機構*:Dept. of Computer Science, Stanford University

備註:EMNLP 2020 Workshop: Deep Learning Inside Out (DeeLIO)

鏈接//arxiv.org/pdf/2011.09658v1

摘要:Relation extraction is the task of identifying relation instance between twoentities given a corpus whereas Knowledge base modeling is the task ofrepresenting a knowledge base, in terms of relations between entities. Thispaper proposes an architecture for the relation extraction task that integratessemantic information with knowledge base modeling in a novel manner. Existingapproaches for relation extraction either do not utilize knowledge basemodelling or use separately trained KB models for the RE task. We present amodel architecture that internalizes KB modeling in relation extraction. Thismodel applies a novel approach to encode sentences into contextualized relationembeddings, which can then be used together with parameterized entityembeddings to score relation instances. The proposed CRE model achieves stateof the art performance on datasets derived from The New York Times AnnotatedCorpus and FreeBase. The source code has been made available.

【16】 Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

標題:多模態臨床自然語言處理中特定文本和黑盒公平演算法的研究

作者:John Chen,Ian Berlot-Atwell,Safwan Hossain,Xindi Wang,Frank Rudzicz

機構*: University of Toronto, Vector Institute, University of Western Ontario, St. Michael’s Hospital

備註:Best paper award at 3rd Clinical Natural Language Processing Workshop at EMNLP 2020

Journal-ref:Proceedings of the 3rd Clinical Natural Language Processing Workshop (2020), pages 301–312

鏈接//arxiv.org/pdf/2011.09625v1

摘要:Clinical machine learning is increasingly multimodal, collected in bothstructured tabular formats and unstructured forms such as freetext. We proposea novel task of exploring fairness on a multimodal clinical dataset, adoptingequalized odds for the downstream medical prediction tasks. To this end, weinvestigate a modality-agnostic fairness algorithm – equalized odds postprocessing – and compare it to a text-specific fairness algorithm: debiasedclinical word embeddings. Despite the fact that debiased word embeddings do notexplicitly address equalized odds of protected groups, we show that atext-specific approach to fairness may simultaneously achieve a good balance ofperformance and classical notions of fairness. We hope that our paper inspiresfuture contributions at the critical intersection of clinical NLP and fairness.The full source code is available here:https:github.comjohntiger1multimodal_fairness

更多資訊,請掃碼添加小助手微信,備註【EMNLP2020】,進入交流群
小助手2.jpg